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Abstract 

We introduce a generalization of the Adaptive Multilevel Splitting algorithm in the 
discrete time dynamic setting, namely when it is applied to sample rare events associated 
with paths of Markov chains. By interpreting the algorithm as a sequential sampler in 
path space, we are able to build an estimator of the rare event probability (and of any non- 
normalized quantity associated with this event) which is unbiased, whatever the choice of 
the importance function and the number of replicas. This has practical consequences on 
the use of this algorithm, which are illustrated through various numerical experiments. 


1 Introduction 

The efficient sampling of rare events is a very important topic in various application fields 
such as reliability analysis, computational statistics or molecular dynamics. Let us describe 
the typical problem of interest in the context of molecular dynamics. 

1.1 Motivation and mathematical setting 

Let us consider the Markov chain defined as the discretization of the overdamped 

Langevin dynamics: 

Vt G N, Vi+i - Vt = -VV(Vi) h + - Wt). (1) 

Typically, Xt G is a high-dimensional vector giving the positions of N particles in 
at time th [h > 0 being the time step size), V : —)• M is the potential function (for 
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any set of positions x G , V{x) is the energy of the configuration), = ksT is the 
inverse temperature and Wt is a standard Brownian motion (so that Wt+h ~ is a vector 
of 3N i.i.d. centered Gaussian random variables with variance h). In many cases of interest, 
the dynamics Q is metastable: the N particles remain trapped for very long times in some 
so-called metastable states. These are for instance regions located around local minima of V. 
This actually corresponds to a physical reality: the timescale at the molecular level (given by 
h, which is typically chosen at the limit of stability for the stochastic differential equation) 
is much smaller than the timescales of interest, which correspond to hopping events between 
metastable states. Let us denote by A C and B C two (disjoint) metastable 

states. The problem is then the following: for some initial condition outside A and B, how to 
efficiently sample paths which reach B before A1 In the context of molecular dynamics, such 
paths are called reactive paths. The efficient sampling of reactive paths is a very important 
subject in many applications since it is a way to understand the mechanism of the transition 
between metastable states. In mathematical terms, one is interested in computing, for a 
given test function ip : (M^-^)^ —)• M depending on the path {Xt)t^n of the Markov chain, the 
expectation 

E(^((Xt)i6N) 

^tb<ta ) ( 2 ) 

where ta = inf{t € N : Xt € A}, tb = inf{t G N : G B} and Xq = xq ^ (AuB) is assumed 

(for simplicity) to be a deterministic initial position close to A: most trajectories starting from 
Xq hit A before B. p = 1, the above expectation is P(tb < ta), namely the probability 
that the Markov chain reaches B before A. This is typically a very small probability: since A 
is metastable and xq is close to A, for most of the realizations, ta is smaller than tb- This is 
why naive Monte Carlo methods will not give reliable estimates of (|^. We refer for example 
to milO] for some examples in the context of molecular simulation. 

1.2 The adaptive multilevel splitting algorithm 

Many techniques have been proposed in the literature in order to compute quantities such 
as Q, in particular control variate techniques, importance sampling methods and splitting 
algorithms (see for example the monograph on rare event simulations). Here, we focus 
on the Adaptive Multilevel Splitting (AMS) method which has been proposed in [8]. Let us 
roughly describe the principle of the method. The crucial ingredient we need is an importance 
function: 

^ ^ M (3) 

which will be used to measure the advance of the paths towards B. This function is known as 
a reaction coordinate in the molecular dynamics community, and this is the terminology we 
will use here. In this paper, we will also call ^{Xf) the level of the process Xt at time t. A 
useful requirement on ^ is the existence of Zmax G M such that 

B C {x e : ^(x) G]zmax, oo[}. 

For any path of the Markov chain, we call the maximum level of this path the quantity 

snp{C{XtATA)t&n}- 

Then, starting from a system of n^ep replicas (all starting from the same initial condition xq 
and stopped at time ta), the idea is to remove the worst fitted paths and to duplicate the best 
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fitted paths while keeping a hxed number of replicas (we will discuss below generalizations 
of the AMS algorithm where the number of replicas may vary). The worst htted paths are 
those with the smallest maximum levels sup{^(AriAT^)teN}- As soon as one of the worst htted 
paths is removed, it is replaced by resampling of one of the best htted path: the new path is 
a copy of the best htted path up to the maximum level of the removed paths, and the end of 
the trajectory is then sampled using independent random numbers. The algorithm thus goes 
through three steps: (i) the level computation step (to determine the level under which paths 
will be removed: this level is computed as an empirical quantile over the maximum levels 
among the replicas); (ii) the splitting step (to determine which paths will be removed and 
which ones of the remaining best htted paths will be duplicated); (iii) the resampling step (to 
generate new paths from the selected best htted paths). By iterating these three steps, one 
obtains successively systems of rirep paths with an increasing minimum of the maximum levels 
among the replicas. The algorithm is stopped when the current level is larger than Zmax, and 
an estimator of ([^ is then built using a weighted empirical average over the replicas. The 
adaptive feature of the algorithm is in the hrst step (the level computation step): indeed, at 
each iteration, paths are removed if their maximum level is below some threshold, and these 
thresholds are determined iteratively using empirical quantiles, rather than by hxing a priori 
a deterministic sequence of levels (as it would be the case in non-adaptive splitting, or more 
generally in standard sequential Monte Carlo algorithms, see mm)- All the details of the 


algorithm will be given in Section 2.5 


In this work we focus on the application of the AMS algorithm to sample Markov chains, 
namely discrete time stochastic dynamics, and not continuous time stochastic dynamics as 
in |8] for example. The reason is mainly practical: in most cases of interest, even if the 
original model is continuous in time, it is discretized in time when numerical approximations 
are needed. There are actually also many cases where the original model is discrete in time (for 
example kinetic Monte Carlo or Markov State Models in the context of molecular dynamics). 

The discrete time setting, which is thus of practical interest, raises specihc questions in the 
context of the AMS algorithm. First, in the resampling step, a natural question is whether 
the path should be copied up to the last time before or hrst time after it reaches the level of 
the removed paths. Second, in the discrete time context, it may happen that several paths 
have exactly the same maximum level. This implies some subtleties in the implementation of 
the splitting step which have a large inhuence on the quality of the estimators, see Section [5.1| 


1.3 Main results and outline 

The main results of this work are the following: 

• We prove that the AMS algorithm for Markov chains with an appropriate implementa¬ 
tion of the level computation and splitting steps yields an unbiased estimator of the rare 
event probability, and more generally of any non-normalized expectation related to the 
rare event of the form ([^ . We actually prove this unbiasedness result for a general class 
of splitting algorithms which enter into what we call the Generalized Adaptive Multilevel 
Splitting (GAMS) framework. 

• Using this GAMS framework, we propose various generalizations of the classical AMS 
algorithm which all yield unbiased estimators, in particular to remove extinction and to 
reduce the computational cost associated with sorting procedures. Moreover, we explain 
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how to use the general setting to sample other random variables than trajectories of 
Markov chains. 


We illustrate numerically on toy examples the importance of an appropriate imple¬ 
mentation of the level computation and splitting steps in the AMS algorithm to get 
unbiasedness. We also discuss through various numerical experiments the influence of 
the choice of the reaction coordinate ^ on the variance of the estimators and we end up 
with some practical recommendations in order to get reliable estimates using the AMS 


algorithm (see Section 5.4). In particular, using the unbiasedness property proven in 
this paper, it is possible to compare the results obtained using different parameters (in 
particular different reaction coordinates) in order to assess the quality of the numerical 
results. 

Compared to previous results in the literature concerning the AMS algorithm, the main 
novelty of this work is the proof of the unbiasedness in a general setting and whatever the 
parameters: the number of replicas, the (minimum) number of resampled replicas at each 
iteration and the reaction coordinate The proof of unbiasedness relies on the interpretation 
of the AMS algorithm as a sequential Monte Carlo algorithm in path space with the reaction 
coordinate as a time index, in the spirit of )16| (the selection and mutation steps respectively 
corresponds to the branching step and the resampling step in the AMS algorithm). This 
analogy is made precise in Section 3.4 In previous works, see for instance [511151122], unbi¬ 


asedness is proved in an idealized setting, namely when the reaction coordinate is given by 
^(x) = ^x{tb < ta) (known as the committor function; here, the subscript x G indicates 
that the Markov chain Xt has x as an initial condition), and for a different resampling step, 
where new replicas are sampled according to the conditional distribution of paths conditioned 
to reach the level of the removed replicas. In many cases of practical interest, these two 
conditions are not met. 

In addition, we illustrate through extensive numerical experiments the influence of the 
choice of ^ on the variance. Indeed, as for any Monte Carlo algorithm, the bias is only one 
part of the error when using the AMS algorithm: the statistical error (namely the variance) also 
plays a crucial role in the quality of the estimator as will be shown numerically in Section 
There are unfortunately very few theoretical results concerning the influence of the choice 
of ^ on the statistical error. We refer to H El for an analysis of the statistical error. For 
discussions of the role of ^ on the statistical error, we also refer to midaEo]. In particular, 
in the numerical experiments, we discuss situations for which the confidence intervals of the 
estimators associated with different reaction coordinates do not overlap if the number of 
independent realizations of the algorithm is not sufficiently large. We relate this observation 
to the well-known phenomenon of “apparent bias” for splitting algorithms, see |14j . 

We would like to stress that our results hold in the setting where a family of resampling 
kernels indexed by the levels is available (see Section 3.1.2 for a precise definition). This is 


particularly well suited to the sampling of trajectories of Markov dynamics (see Section 3.5.4 
for another possible setting). In the terminology of |16) . we have in mind the dynamic setting 
(considered for example in |8|), and not the static setting (considered for example in [3 E])- 
The paper is organized as follows. The AMS algorithm applied to the sampling of paths 
of Markov chains is described in Section This algorithm actually enters into a more gen¬ 
eral framework, the Generalized Adaptive Multilevel Splitting (GAMS) framework which is 
described in detail in Section The interest of this generalized setting is twofold. First, it 
is very useful to write variants of the classical AMS algorithm which will still yield unbiased 
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estimators (some of them are described in Section 3.5). Second, it highlights the essential 
mathematical properties that are required to produce unbiased estimators of quantities such 
as Q. This is the subject of Section which is devoted to the main theoretical result of this 
work: the unbiasedness of some estimators, including estimators of Q. Finally, Section is 
entirely devoted to some numerical experiments which illustrate the unbiasedness result, and 
discuss the efficiency of the AMS algorithm to sample rare events. 


1.4 Notation 

Before going into the details, let us provide a few general notations which are useful in the 
following. 

• The underlying probability space is denoted by (fl, P). We recall standard notations: 
for m cr-fields d T^ .. .\IT^ denotes the smallest u-field on Q containing 

all the fj-fields ..., J-"^. For any t, s £ N, t A s = min{t, s} and tV s = max{t, s}. 
We use the convention inf 0 = +oo. For two sets A and B which are disjoint, AL\ B 
denotes the disjoint set union. 


We work in the following standard setting: random variables take values in state spaces 
£ which are Polish (namely metrizable, complete for some distance dg and separable). 
The associated Borel fi-field is denoted by B{£). We will give precise examples below 
(see for example Section 2.1 for the space of trajectories for Markov chains). 


Then Proba(iS) denotes the set of probability distributions on £. It is endowed with 
the standard Polish structure associated with the Prohorov-Levy metric which metrizes 
convergence in distribution, i.e. weak convergence of probabilities tested on continuous 
and bounded test functions (see for example [T]). 


The distribution of a iS-valued random variable X will be denoted by Law(A). 


• If £i and £2 are two Polish state spaces, a Markov kernel (or transition probability 
kernel) Ii{xi,dx 2 ) from £i to £2 is a measurable map from initial states in xi £ £ 1 , to 
probability measures in Proba(i? 2 )- 


• We use the following standard notation associated with probability transitions: for (p : 
i ?2 —t M a bounded and measurable test function, 

U{(p){xi) = f ip{x 2 )Il{xi,dx 2 ). (4) 

d X2&£2 

Similarly, we use the notation Tr{(p) = (p{x)7r{dx) for vr G Proba(<52). 

• Let Xi and X 2 be random variables respectively with values in £i and £2 and 11 a Markov 
kernel from £i to £ 2 . In the algorithms we describe below, we will use the notion of 
conditional sampling: X 2 is sampled conditionally on Xi with law n(Ai, .) (denoted by 
X 2 ~ II(Aii, .)) rigorously means that X 2 = f{Xi,U) a.s. where, on the one hand, U 
is some random variable independent of Xi and of all the random variables introduced 
before (namely at previous iterations of the algorithm) and, on the other hand, / is a 
measurable function which is such that n(xi, .) = Law(/(xi, U)), for Law(Xi)-almost 
every xi £ £ 1 . 
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• A random system of replicas in £ is denoted by 




card / < + 00 , 


(5) 


where / C N* is a random finite subset of labels and are elements of £.. The 

space is endowed with the following distance: for and = 




2 

minj^^g^i if = P. 


Endowed with this distance, the set is Polish and we denote by the Borel 

cj-field. This cj-field can also be written as follows: 


/ex 

where Z denotes the ensemble of finite subsets of N* (which is a discrete set). 


• When we consider systems of weighted replicas, to each replica of the system X with 
label n G I is attached a weight G M+, and we use the notation X = (^X^P , 

The topological setting is the same as in the previous item, £ being replaced by the 
augmented state space T x M. 


2 The AMS algorithm for Markov chains 

The two goals of this section are to define the AMS algorithm applied to paths of a Markov 
chain (namely a discrete time stochastic process) and to introduce unbiased estimators in this 
setting. 

A special care should be taken to treat the situations when many replicas have the same 
maximum level, or the situations when there is extinction of the population of replicas. These 
aspects which are specific to the discrete time setting were not treated in details in many 
previous works where continuous time diffusions were considered. 

2.1 The Markov chain setting 

Let X = {Xt)t£n a Markov chain defined on a probability space P), with probability 

transition P. We assume that Xt takes values in a Polish state space S. Without loss of 
generality, we assume that Xq = xq where xq G 5 is a deterministic initial condition. The 
generalization to a random initial condition Xq is straightforward. 

The path space is denoted by 

V = {x = {xt)t&n : Xt G S for all t G N} . (6) 

It is well-known that, by introducing the distance d'p{x,y) = Ylt&n ^ ^^Ps<tds{xs,ys)) 
(which is a metric for the product topology), the space {V,dp) is complete and separable. We 
denote by B(V) the corresponding Borel a-field. We thus see A as a random variable with 
values in V. 

The set of paths (V,B{V)) is endowed with the natural filtration in time {Bt)t£n- C 
B{V) is the smallest fi-field such that (x5)5gN G V (xi,..., xt) G is measurable. Observe 
that the natural filtration for a given Markov chain X defined on P) with values in S 

is then given by the pullback of the filtration {Bt)t£n t>y X : {£i,P) -G {V,B(V)). 
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2.2 The rare event of interest 

Given two disjoint Borel snbsets A and B of S, onr main objective is the efficient sampling of 
events snch as {tb < ta} where 

ta = inf G N : Xt E and tb = inf G N : G 

are respectively the first entrance times in A and B. Both ta and tb are stopping times with 
respect to the natnral filtration of the process X. 

We are mainly interested in the estimation of the probability ¥{tb < ta) in the rare event 
regime, namely when this probability is very small (typically less than 10“®). This occnrs for 
example if the initial condition xq G H B^ is snch that xq is close to A, and A and B are 
metastable regions for the dynamics. The Markov chain starting from (a neighborhood of) A 
(resp. B) remains for a very long time near A (resp. B) before exiting, and thns, the Markov 
chain starting from xq reaches A before B with a probability close to one. Specific examples 
will be given in Section 

Let ns introdnce the Markov chain stopped at time ta'- X = {Xt)^^^ where 

Xt = Xt^rA for any t G N. (7) 

The probability distribntion of the stopped Markov chain X (seen as a "P-valned random 
variable) is denoted by 

TT = Law (X) G Proba('P). (8) 

The probability of interest can be rewritten 

F{tb <ta) =F (1tb(x)<Ta(x)) 
where we denote for any path x € V 

Ta(x) = inf {t G N : Xi G A} and Tb(x) = inf {t G N : Xt G B} . 

More generally, the algorithm allows ns to estimate expectations of the following form: 

7r(<^)=E((/p(X)), (9) 

for any observable <y9 : "P —)• M snch that 7r(|(/?|) is finite. 

Remark 2.1 (On the stopping times ta and tb)- We defined above the stopping times as 
first entrance times in some sets A and B. As will become clear below, the definition of the 
algorithm and the nnbiasedness resnlt only reqnire ta and tb to be stopping times with respect 
to the natnral filtration of the chain X {i.e. A a and T^ to be stopping times on (V,B{V)) 
endowed with the natnral filtration). 

2.3 Reaction coordinate 

The crncial ingredient we need to introdnce the AMS algorithm is an importance fnnction, also 
known as a reaction coordinate or an order parameter in the context of molecnlar dynamics. 
This is a measnrable M-valned mapping defined on the state space S: 

C'.S ^R. 
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The choice of a good function ^ for given sets A and i? is a difficult problem in general. One 
of the main aims of this paper is to show that whatever the choice of it is possible to define 
an unbiased estimator of (§. The only requirement we impose on ^ is that there exists a 
constant Zmax £ such that 

B C (]^;max, +Oo[) . (10) 

In what follows, the values of ^ are called levels and we will very often refer to the maximum 
level of a path, defined as follows: 

Definition 2.2. For any path x €V, the maximum level of x is defined as the supremum of 
^ along the path x stopped at T^(x).' 

E{x) = sup{C(xi;\T^(,j,))tgN} e M U {+ 00 } . (11) 

The function H can be seen as a reaction coordinate on the path space V. 

We also introduce for any level 2 : G M and any path x € V 


Tfix) = inf{t G {0,..., Ta(x)} : f{xt) > z}, 


( 12 ) 


which is the first entrance time of the path x stopped at Tyi(x) in the set Too)). We 

emphasize on the strict inequality in the above definition of the entrance times T^(x): it is one 
of the important ingredients of the proof of the unbiasedness of the estimator of ([^. Notice 
that the above assumption (10) on B is equivalent to the inequality 


Vx G V, T 2 ^^^(x) < inf{t G {0,..., Ta(x)} : x* G B}. 


We denote by 

r, = TfiX) = inf{t G {0,..., ta} : > z}. (13) 

the entrance time associated with the (stopped) Markov chain X. It is a stopping time for 
the natural filtration of the Markov chain. 

Remark 2.3 (On Assumption dTo])). Assumption ( |10[ ) is extremely useful in practice when 
computing approximations of averages of the form E {(p{X)\rB<TA)'- allows to remove from 
memory the replicas which are declared “retired” in the splitting step at each iteration in the 
AMS algorithm described in the sequel, since by construction we know in advance that they 
will not contribute to the computation of the associated estimator. The algorithm thus only 
requires to retain a fixed number of replicas, denoted by Urep below. 


2.4 Resampling kernel 


The AMS algorithm is based on an interacting system of weighted replicas. At each iteration, 
copies of the Markov chain, called replicas, are simulated (in parallel) and are ranked according 
to their maximum level (11). The less fitted trajectories, i.e. those with the lowest values 
of the maximum levels, are resampled according to a resampling kernel. For any 2 : G M the 
resampling kernel tTz (which is in fact a transition probability kernel from M x "P to P) is 
denoted by 

f P —^ Proba(P) 

X I— )■ -Kzix-, dx) 




(14) 
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and is defined as follows: for any x G V, tTz{x, dx') is the law of the "P-valned random variable 
Y snch that 

(Yt=xt ift<Tz{x) 

|Law(yt|n, 0<s<t-l) = P(yt_i, .) ift>T,(x) 

and is stopped at Tyi(y) when Y hits A. We recall that P is the transition kernel of the 
Markov chain X. In other words, for t < T 2 (x), Y is identically x, while for t > Yz{x)-, 
Yt is generated according to the Markov dynamics on S, with probability transition P, and 
stopped when reaching A. We thns perform a branching of the path x at time Yz{x) and 
position 

Notice from the dehnition of the resampling kernel that if H(x) < z, then the resampling 
kernel does not modify x: Yz{x) = +oo and tTz{x, dx') is a Dirac mass: Yt = Xtt^TAix) 
t G N. 

One of the important ingredients to prove the nnbiasedness of the estimator of (§ is 
the following right-continnity property: for all x G P, and for any continnons bonnded test 
fnnction y? : "P —)> M, the mapping 




/ ip{x')irz{x,dx') 

Jr 


is right-continnons, see Section 3.3 for a proof of this statement. 


2.5 The AMS algorithm 

In this section, we introdnce the AMS algorithm in the specihc context of sampling of Markov 
chain trajectories. The associated nnbiased estimator of ([^ is given in the next section. A 
generalized adaptive mnltilevel splitting framework which encompasses the AMS algorithm 
described here will be provided in Section 


Short description of the AMS algorithm In addition to the choice of the reaction 
coordinate two other parameters of the algorithm need to be specihed: n^ep, the nnmber of 
replicas, and k G {1,..., Urep — 1} the (minimnm) nnmber of replicas resampled at each step of 
the algorithm. At the g-th iteration, the replicas of the Markov chain X are denoted by 
where n is the label of the replica. The algorithm dehnes a non-decreasing seqnence of random 
levels At the beginning of iteration q, the level is the k-th order statistics of the 

maximnm levels of the “working” replicas (the notion of “working” replicas is dehned in 

the fnll description of the algorithm). Then, all replicas with maximnm levels lower or eqnal 
to ^(9) are declared “retired", and resampled in order to keep a hxed nnmber n^ep of replicas 
with maximnm level strictly larger than Z^9) ^s explained above, the resampling procednre 
consists in dnplicating one of the replica snch that its maximnm level is larger than 
np to the time T^(q), and then in nsing the resampling kernel, which amonnts in completing 
the trajectory np to time with the Markov transition kernel P. The set of labels of the 
TT-rep replicas obtained at the end of the g-th iteration and which, by constrnction, all have a 
maximnm level larger than Z^9) jg denoted by The snbscript “on” indicates that these 

are the replicas which have to be retained to pnrsne the algorithm (the so-called “working” 
replicas in the terminology introdnced below). Consistently, the snbscript “off” refers to the 
“retired” replicas at a given iteration. The algorithm stops either if the level Zmax is reached 
(namely > Zmax) or if all the replicas at the end of iteration q have maximnm levels lower 
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than or equal to the /c-th order statistics. When the algorithm stops, an unbiased estimator 
of (|^ is then defined as a weighted empirical average over the replicas. 

Pull description of the AMS algorithm The AMS algorithm generates iteratively a 
system of weighted replicas in the state space using selection and resampling steps. 

In order to define an estimator of 7r((/9) for any observable (p, we will need to consider 
the set of all the labels of the replicas which have been declared retired before iteration 
q (namely those with a maximum level smaller or equal than We will denote by 

= Jon U los labels of the replicas generated by the algorithm up to 

iteration q. We recall that U denotes the disjoint set union. Notice that the cardinal of 
is increasing, while card Ion = ''^rep for any q. At step q, the replicas with labels in Ion are 
referred to as the working replicas, and the replicas with labels in 1^^ as the retired replicas. 
The unbiased estimator of (|^ associated with the AMS algorithm will be defined in the next 
section. 

We are now in position to introduce the AMS algorithm in full detail (see Figure for a 
schematic representation of one iteration of the algorithm). 


The initialization step (q = 0) 

(i) Let be i.i.d. replicas in V distributed according to vr, defined by ([^. 

At this initial stage, all replicas are working replicas i.e. = lo^ = {!,... ,nrep} and 



(ii) Initialize uniformly the weights: = 1/nrep for n G {!,... ,nrep}. 

(iii) Compute the order statistics of : namely a permutation of the set 

of labels I® = Urep} such that 

and set the initial level as the fc-th order statistical] 

^( 0 ) ^ 

(iv) If card |n G Ion : = Urep, then set = +oo. 


Iterations Iterate on q >0, while the following stopping criterion is not satisfied. 


The stopping criterion If Z^^'> > Zmaxj then the algorithm stops. When it is the case, set 
Qiter = Q- Else perform the following four steps. 


^Notice that is not necessarily unique since several replicas may have the same maximum level. Nev¬ 
ertheless, the level does not depend on th e ch oice of E^°L The same remark applies to the definition of 
the level iteration q > 0, see Remark ! 


2.4 


10 




The 


splitting (branching) step 

Consider the following partition of the working replicas’ labels in Ion- 

j{q) — ril) y r(9) 
on on,<Z(i') on,>Z(i) 

where replicas with maximum level smaller or equal than have labels in 


I 


(<?) 


on,<Z(i) 




while the set of replicas’ labels with maximum level strictly larger than is 






{n G > Z^'')} 


We set ^( 5 + 1 ) = card/^'^^ 

Aq) 


<zM 


= ■'^rep — card/ 


{q) 


on,>Z('j) 


. Notice that //('J+i) > xhe 


set /on denotes the working replicas at the beginning of iteration q. Among them, the 
replicas with labels in <^( 5 ) will be declared retired and replaced by branching replicas 

with labels in using the resampling kernel explained below. Notice that 

necessarily, card ^ 1 (otherwise Z^'^^ = +00 and the stopping criterion has been 

fulfilled before entering the splitting step of iteration q). 

Introduce a new set /nlw^^ = {card/*-'^^ + 1,..., card/(5) + G N* \ /(^) of labels 

for the new replicas sampled at iteration q. 


Ill 


Ai) 


on,>ZM 


as follows: the la- 


Define the children-parent mapping P^^+i) ^ /nlw^^ 
bels (card/('?) are //(i+i) random labels independently and uni¬ 
formly distributed in • 

This mapping associates to the label of a new replica the label of its parent. The parent 
replica (with label in I^y^(q)) is used in the resampling procedure to create the new 

replica (with label in /nlw^^)- 


(iv) For any n G > th® branching number 


j^{'n,q+A = 1 _|_ card |n' G /new^^ • = n| 


(16) 


represents the number of offsprings of The replica will be split into /^("-’'i+i) 

replicas: the old one with label n G > 1, /?(’^’'J+i) _ i 

new ones with labels n' G /nlw^^ such that = n . 

The sets of new labels are then updated as follows: 

r(g+i) _ AA) I I r(g+i) Aq+A _ riq) I I jiq) r( 9 +i) — riq+A 1 1 r('?+i) 

0 ° “ on,>Z('?) new > -'off “ ''off -'on,<Z(9) ’ '' —-'on '-'-'off ’ 


Notice that by construction card /on"*"^^ = n. 


rep* 
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The weights are updated with the following rule: 


G'l’T’.'j+i) 


Q<(n,Cl) 

^rep 

q(^P(q+t^) (^n) ,q+l) 


n 

n 

n 


G I 
( 

G Inew 


(g+1) 

off 

(g) 

on,>2(9) 

(g+1) 


Observe that for n G /on^^\ 


^(n,g+l) _ 'Gep 


- n^ep - 


rirep - 1 


n. 


rep 


n 


rep 


n 


rep 


n 


rep 


( 17 ) 


Moreover, the weight of a replica remains constant as soon as it is retired (namely from 
the hrst iteration q such that its label is in 


The 

(i) 

(ii) 


resampling step 

Replicas in are not resampled: for n G 

For n' G/new^\ is sampled according to the 


Section 


2.4) ,dx'). The new replica 


branching its parent replica 


resampling kernel (dehned in 
is thus obtained by 


The level computation step Compute the order statistics of (^(X^"'’'^'''^))) .( 9 + 1 ), namely 

a bijective mapping : { 1 ,... ,nrep} —t Fon"*"^^ (we recall that card/on"*"^^ = '^rep) such 

that 

and set the new level as the fe-th order statistics: 


If card |n G = Urep then set = + 00 . 


(18) 


Increment Increment q ■<— g + 1, and go back to the stopping criterion step. 
Notice that Quer is such that 


Qiter = inf{g > 0 : ^ ■2'max}' 

The number of times the loop consisting of the three steps (splitting / resampling / level 
computation) is performed is exactly Qiter- 

If = _|_oo^ none of the working replicas at the iteration Qiter — 1 is above the 

new level and thus, all of them would have been declared retired at the 

iteration Qiter^ this situation is referred to as extinction. 
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Figure 1: Schematic representation of the first iteration of the AMS algorithm, with Urep = 4 
and k = 2. The replicas numbered 2 and 4 are declared retired at the first iteration, and are 
replaced by the replicas with label 5 and 6, which are respectively resampled from the replicas 
with labels 3 and 1. 

Remark 2.4 (On the number of resampled replicas). It is very important to notice that the 
number of resampled replicas is at least k, but may be larger than k. In other words, at 
iteration q, with the above notation, is not necessarily equal to k. This requires at 

least two replicas to have as the maximum level at the beginning of iteration q. Actually, 
it may even happen that, in the level computation step, all the replicas in have as 

the maximum level, which implies extinction: card |n G = rirep, 

^(9+1) = _|_oo and the algorithm stops. 

As an example, let ns explain a three step procedure which leads two replicas to have the 
same maximum level, which in addition is the minimum of the maximum levels over all the 
working replicas, in the case k = 1. 

Assume that in the splitting and resampling steps, the three following events occur (see 
Figure]^ for a schematic representation): 

1. One of the selected replica (referred to as X) has the smallest maximum level among all 

the others in . 

on,>Z'‘i> 

2. The first time TzM where this replica goes beyond the current level Z^*?^ also corre¬ 
sponds to the time at which this replica reaches its maximum level: 

3. By the resampling procedure, the new replica (referred to as Y) which is generated 

(starting from time T 2 {q)(A), see the resampling kernel (|14[)~([I^) is such 

that ^(Tfc) < ^(Arp (g)(x)) = ^(^) foi' ^ > T^{ 5 )(A). Thus the new replica Y has 
the same maximum level as the selected replica A: H(y) = H(A). 

Then, at the next iteration, one obtains two replicas which have the same maximum level, 
which is the minimum of the maximum levels over all working replicas. As a consequence, 
both replicas will be resampled at the next iteration of the algorithm, even if A; = 1. 

By iterating the procedure, one can thus obtain many replicas having this same maximum 
level, or even all replicas having the same maximum level (which leads to extinction). Other 
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Figure 2: Schematic representation of a procedure leading to the equality of the maximum 
levels (y-axis) of two replicas. We represent the evolution of the level of each replica at discrete 
times (x-axis). Left: the current level is the maximum level of the green-crosses replica. 
The blue-squares replica has been selected to be resampled. Right: action of the resampling, 
where the new black-crosses replica has the same maximum level as the selected replica. Here 
A = {x : ^{x) < —0.5} and B = {x : ^(x) > 1}, and Zmax = 1- 


similar procedures can lead to the equality of the maximum levels of two (or more) different 
replicas. For instance, even if the selected replica in the first step of the procedure above 
does not have the smallest maximum level among others (the first step is thus skipped), the 
two next steps will still create two replicas with the same maximum level. This implies that 
in some next iteration of the algorithm more than k replicas will be declared retired and 
resampled. 

The three events above have small probabilities especially if Urep is large, or if the time-step 
size is small if one thinks of the Markov Chain as the time discretization of a continuous time 
diffusion process (such as 0 )- But in practice, over many iterations and many independent 
runs, such situations are actually observed and must be taken into account carefully in the 
definition of the algorithm. In Section 5.1, we investigate in a simple test case the phenomenon 
described in this remark, and we illustrate the importance of a proper implementation of the 
splitting and resampling steps in such situations, in order to obtain unbiased estimators. 


2.6 The AMS estimator 


For any bounded observable : "P —)> M, a realization of the above algorithm gives an estimator 
of the average vr((^) (see (|^) defined by: 

if = ^ Q(n,QiteT)^p(^X^n,Qit^^)y 

nG/^^iter) 


One of the main aim of this paper is to prove that this estimator is unbiased (see Theorem 4.1 
below): 

E((,J) = 7r{(p). 


In order to highlight the main features which make the estimator unbiased, we will actually 
prove this result for a larger class of models and algorithms introduced in Section 

A particular choice of interest for some applications is ^p{x) = lTs(a:)<TA(a;)) which case 
one obtains an unbiased estimator of the probability p = W‘(tb < ta)- In this case, due to 


the assumption (10) on B, only replicas with labels in 


contribute to the estimation. 
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and thus, from one iteration to the other, only replicas with labels in /on have to be retained, 
namely a system of n-rep replicas. For this specific observable ip{x) = ^'Tg[x)<TA{x)-: the 
estimator of p = ^{tb < ta) is denoted by p and is defined from (19) as: 


V = 




Ts(X("’‘3iter))<TA(X("’‘5 iter) ^ 


n,ep - //(*5ited n^ep - //(I) „ 

... 1 rt 


Tlr 


'rep ^rep 

where the so-called “corrector term” is given by 

1 


p — 
^ corr — 


n 


E 1 




TB(X(’"’'5iter))<TA(X("-'3iter)) 


( 20 ) 


( 21 ) 


namely the proportion of working replicas that have reached B before A at the final iteration. 
The properties of this estimator will be numerically investigated in Section 


3 Generalized Adaptative Multilevel Splitting 


In this section, we introduce a general framework for adaptive multilevel splitting algorithms, 
which contains in particular the AMS algorithm of Section We refer to this framework as 
the Generalized Adaptive Multilevel Splitting (GAMS) framework. In particular, we prove in 
Section 3.3 that the AMS algorithm of Sectionfits in the GAMS framework. The interest of 
this abstract presentation is twofold. First, it highlights the essential mathematical properties 
that are required to produce unbiased estimators of quantities such as (|^. As will be proven 
in Section]^ any algorithm which enters into the GAMS framework yields unbiased estimators 
of quantities such as (§. Second, it is very useful to propose variants of the classical AMS 


algorithm which still yield unbiased estimators: we propose some of them in Section 3.5 


The section is organized as follows. In Section |3.H we introduce in a general setting the 
quantities we are interested in computing, and the main ingredients we need to state the GAMS 


framework. In Section 3.2 the GAMS framework is presented. In Section 3.3 we prove that 
the AMS algorithm introduced in Section]^ for the sampling of paths of Markov chains enters 
into the GAMS framework. There is a strong analogy between GAMS and Sequential Monte 
Garlo algorithms (the branching step and the resampling step below correspond respectively 


to the so-called selection and mutation steps): this is made precise in Section 3.4 Finally, we 


propose in Section 3.5 some variants of the classical AMS algorithm to illustrate the flexibility 
of the GAMS framework. 


3.1 The general setting 

In this section, we introduce the ingredients and the main assumptions we need in order to 
introduce the GAMS framework. Throughout this section, the notations are consistent with 
those used in the context of the AMS algorithm. 

Let (11, T, P) be a probability space. Let us introduce the state space V, which is assumed 
to be a Polish space and let us denote B{V) its Borel cr-field. For example, in Section]^ the 
state space is the path space of Markov chains (see (|^). Let A be a random variable with 
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values in {V,B(V)) and probability distribution 

TT = P o X~^ G Proba('P). 

The aim of the algorithms we present is to estimate 

= / ^{x)T:{dx) ( 22 ) 

Jv 

for a given bounded measurable observable (^ : "P —)> M. 

The two main ingredients we need in addition to is a filtration on (V,B{'P)) 

(from which we build filtrations on (P’'®P, P(P'’®P)) and on (Qjp,P)) and some probability 
kernels Trz{x, •) from M x P to P. Let us introduce them in the next two sections. 

3.1.1 The filtrations 

In order to define the GAMS framework, we need an additional structure on {V, B{V)), namely 
a filtration indexed by real numbers that we call in the following “levels”. Therefore, we assume 
in the following that the space (P, P(P)) is endowed with a filtration indexed by levels z G M 

(filG),gK C P(P), (23) 

namely a non-decreasing family of cr-fields: for any z < z', filt^ C filt^/ C B{V). For example, 
in the context of Sectionj^ the filtration is defined as follows: for any z G M, filt^ is the smallest 
cj-field on P which makes the application x gV ^ i^tAT;,{x))t>o £ {'P,B(V)) measurable. 

From the filtration given by ( |23[ ), we construct a filtration (filt^®P)j,g]g on the space of 
replicas (P’’®p, i3(P’^®P)) (defined by (1^) by considering the disjoint union of the filtration filtz 
on P: 

filtf P = □ (611.)“"*' 

/ex 

where, we recall, X denotes the ensemble of finite subsets of N*. 

For any random variable X : (n,P) —?■ (P,P(P)), we define a filtration (filt^)^gjg on the 
probability space by pulling-back the filtration (filt^)^g][ 5 : 

mtf = A-^(filG). (24) 

If A = £ pi'^P denotes a random system of replicas - i.e. a random variable 

X : (n, P) —)• (P''®P, P(P’’®P)) - we also define the filtration (filt^)^gjg by the same pulling-back 
procedure: 

mtf = A-^(filtfP). 

By convention, we set 


filt^Oj^ = a{I) and hlt^^g^ = P, 

where (j{I) is the cr-field generated by the random set of labels I. As a consequence for any 
z G M we have filt;^^^ C filt^ C hlt+go. 

We finally introduce the notion of stopping level, which is simply a reformulation of the 
notion of stopping time in our context where the filtrations are indexed by levels instead of 
times. 
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Definition 3.1 (Stopping level, Stopped cr-field). Let {J^z)z£R a filtration on 
A stopping level Z with respect to {J'z)z£W. ® random variable with values in M such that 

{Z < z} € J^z for any z G MU{— oo, +oo}. The stopped a-field, denoted by Tz, is characterized 
as follows: 

A G iFz if and only if Vz G M, A n {Z < z} G Tz- 
In particular, Z is a Tz-measurable random variable. 

Remark 3.2 (On the definition of the filtrations). In many cases of practical interest, for 
any ^ G M, filt 2 is defined as the smallest filtration which makes an application Fz : V ^ 
measurable, for some Polish space £. Then, filt^®^ is the smallest filtration which 
makes the application Gz : —>■ measurable with Gz{{X^'^i)n^i) = {Fz{X^^'>))nej. For 

example, in the setting of Section 0 Fz :V ^ V and Fz{x) = (a:tAT4x))t>o- 

3.1.2 The resampling kernels 7r^(x, ) 

The second ingredient we need in addition to the filtrations introduced above is a transition 
probability kernel from M x "P to "P (M x "P being endowed with the Borel u-field P(M x P)): 
(z, x) G M X P I—)■ 7r2(x, •) G Proba(P). By convention, for any x G P , we set 7r-oo(x, •) = vr 
(which is consistent with Assumption below) and 7r+oo(3;, •) = explicit example 

of a resampling kernel in the Markov chain example of Section we refer to Section |2.4[ 

This kernel is used in the resampling step as a family of transition probabilities from P 
to P, indexed by the level z. For a given level 2 ; G M and a given state x G P, 7rz{x,dx') is 
the probability distribution of the resampling of the state x from level z. In the following, we 
will refer to this transition probability kernel as a resampling kernel, since it is used in the 
resampling step. 


3.1.3 Assumptions on (filt^) 2 g]R and (7rj;)2g]R. 

We will need two assumptions on (filt^)^gR and ('7r2)2g]R. The first assumption states a right 
continuity property of the mapping TTz{(j)){x) with respect to z and is required to apply the 


Doob’s optional stopping theorem in the proof of Lemma 4.5 


Assumption 1. For any x G P, and any continuous bounded test function ip : P 


j M M 

I z i-G 'Kz{p){x) 

is right-continuous. Moreover, \\mz^-ooT^z{T){^) — 

Recall the notation introduced in (Q: Mz G M, Vx G P, Trz(p)ix) = p{y)-Kz{x,dy). 
Second, we require a consistency relation between the filtration (filt^)^gR and the transi¬ 
tion probability kernel (vr 2 )^gR. 

Assumption 2. Let us consider a random variable X, (filt^)^gR and {7rz)z£R os introduced 
above. We assume the following consistency relation: if X is distributed according to 7rz{x,-) 
for some {z, x) G M x P, then for any z' > z and for any bounded measurable test function 
: P —)■ M, 

E ((^(A)|filty) = 7rz'{p){X) a.s. 
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As a consequence (by letting z —>■ — oo in the previous assumption), if X distributed 
according to vr, then for any z' G M 7rz'{X, •) is a version of the law of X conditional on filt^. 
Therefore, the cr-field hlt^ can be interpreted as containing all the information on a replica 
X necessary to perform the resampling with iTz'iX, ■) from X at a given level z' G M. 

Let us finally mention that in addition to these two assumptions and from a more practical 
point of view, it is also implicitly assumed that it is possible to sample according to the prob¬ 
ability measure tt (step (ii) of the initialization step below) and according to the probability 
distribution 7Tz{x, •), for any x £ V and z G M (step (ii) of the resampling step below). 

We will check in Section 3.3 that the Markov chain example of Section enters into the 
general setting introduced in this section. 

We are now in position to introduce the GAMS framework in the following section. 


3.2 The Generalized Adaptive Multilevel Splitting framework 

The aim of this section is to introduce a general framework for splitting algorithms (which we 
refer to as the Generalized Adaptive Multilevel Splitting (GAMS) framework in the sequel). 
The structure of the GAMS framework described in this section is quite similar to the one 
for the AMS algorithm of Section One important difference is the introduction of a family 
of filtrations in the general setting. It iterates over three successive steps: (1) the branching 
or splitting step, (2) the resampling step and (3) the level computation step. These steps are 
performed until a suitable stopping criterion is satisfied. 

We denote by Qiter the number of iterations, which in general is a random variable. At 
each iteration step g > 0 of the algorithm the distribution vr is approximated by an empirical 
distribution over a system of weighted replicas ;= (X^”'’'^), G 'P’'®P, where 

/(?) (2 N* is the (random) finite set of labels at step q of the algorithm and G M+ is the 

(random) weight attached to the replica 

As it will become clear, in order to obtain a fully implementable algorithm from the GAMS 
framework, three procedures need to be made precise (i) the stopping criterion, (ii) the compu¬ 
tation rule of the branching numbers and (hi) the computation of the stopping levels. These 
procedures require to define three sets of random variables: (ijl^-’Q+i))^ 


and (Z('^))q>o, that are used in the GAMS framework presented in the next section 


precise assumptions on these random variables will be stated in Section 3.2.2 


tion below). As already mentioned above, the AMS algorithm of Section ^ corresponds 
to specific choices of these three items, but the GAMS framework allows for many variants 




3.2.1 


The 


(see Assump- 


(see Section 3.5). The estimator associated with the GAMS framework is finally defined in 
Section 13.2.31 


3.2.1 Precise definition of the GAMS framework 

We now introduce the Generalized Adaptive Multilevel Splitting (GAMS) framework, which 
is an iterative procedure on an integer index g > 0. 

The initialization step {q = 0) 

(i) Define the initial set of labels = {1,... ,cardl^*^)} C N*, where card/^^^ is assumed 
to be positive and finite. 
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(ii) Let be a sequence of "P-valued i.i.d. random variables, and distributed 

according to the probability measure tt. 

(iii) Initialize uniformly the weights: for any n G set = 1/card/^*^^. 

(iv) Define the system of weighted replicas and for any 2 : G M, 

define the it- field of events = filt^^°^. 

(v) Sample the initial level (it is assumed to be a (-Fi^^)zeR “ stopping level). 

(vi) Define the cr-field of events . 

Iteration Iterate on q >0, while the stopping criterion is not satisfied. 

The stopping criterion Sample the random variable G {0,1} (which is assumed to 
be J^'^'^^-measurable). If 5^'^^ = 0 then the algorithm stops and we set Qiter = Q- Otherwise, if 
= 1, the three following steps are performed. 


The splitting (branching) step 

(i) Conditionally on sample the N-valued random branching numbers 
which are assumed to satisfy: for any n G 

E > 0 a.s. 

The random variable represents the number of offsprings of the replica XO’?). 

If replica will be split into replicas: the old one 

(parent) with label n G and, if > 1, — 1 new ones (chil¬ 
dren) that are defined in the resampling step below. If = 0, the replica is 

removed from the system. Let us thus introduce the set of labels of such replicas: 

= {" e = 0}- 

(ii) Compute the total number of new replicas = Yln£l(‘i'> — 1, 0}. 

(iii) Introduce the set Inew^^ = {max/^'^^-|-1,..., max /D) + X(9+i)} C N* \ /('?) for new 
labels and update the total set of labels 

/<»+■> = (/''')\4’mI)u/<««>. (25) 


IV 


Set a children-parent map : Jnlw^^ — 5- \ .^killed such that for any n G 

we have 

card |n' G = n| = — 1. 

This map associates to the label of a new replica the label of its parent. The parent 
replica (with label n G \ L^ntd ) used in the resampling procedure to create the 
new replica with label n' G where n and n' are related through the children-parent 

map by (n!) = n. Notice that this map is determined up to a permutation of 

For notational convenience, we extend the map to as follows: p('?'*'^)(n) = n for 

any n G \ 
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(v) 


Update the weights as follows: for all n' G and 


Q{n',q+l) 




n 


G /(9) \ such that 


(26) 


The resampling step 

(i) Replicas in \-^knied resampled i.e. for any n G \-^knied > 

(ii) For n' G /new^\ x^’^' is sampled by branching its parent replica i.e. 

according to the resampling kernel tt 

Then set , 


The level computation step 

(i) For any z G M, define the cj-field of events 

jrU+i) = jr(q) y (t(p(''+^)) V filtf . (27) 

The cr-field generated by pl^+i) contains in particular the u-field generated by (-B^’^’'^'’'^^)„gj( 9 ). 

(ii) Sample the next level G M, which is assumed to satisfy: 

• Z(9+U > Z('?) a.s. 

• is a stopping level with respect to () 

V / zSK 

(iii) Define the cr-field of events . 

Increment Increment q ^ q+1 and go back to the stopping criterion. 

For theoretical purposes, we need in the following to define the system of weighted replicas 
and the associated filtration for all (7 > 0 (and not only up to the iter¬ 

ation Qiter)- This is simply done by considering the iterative procedure above with = 0 
for all q > 0. 

Remark 3.3 (On the labeling). The way the replicas are labeled is purely conventional. 


3.2.2 Prom the GAMS framework to a practical algorithm 


In the GAMS framework, we have defined (see (27)) a family of cr-fields which is indexed both 
by the level z G M and by the iteration index q > 0 and which is denoted by (J^i'^^) 


By 


construction, this family of cr-fields satisfies ^ 


other words, the family ( X- 


7(1) 


q>0,zGM. 


q>0,z£M.‘ 

if g < q' or {q = q' and z < z'\. in 
is a filtration if N x M is endowed with the lexicographic 


ordering. At the end of the g-th iteration of the algorithm (g > 0), one can think of the d-field 
jr(g+i) _ as containing all the necessary information required to perform the next step 

of the algorithm. 

To make a practical splitting algorithm which enters into the GAMS framework, three 
sets of random variables need to be defined: (-R^”’'^''~^^)q>o As 

already stated above, we assume the following on these random variables. 
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Assumption 3. The random variables and satisfy 

the following properties: 


the sequence of random variables {S^^^)q>o needed for defining the stopping criterion, are 
such that is with values in {0,1} and is -measurable; 

the sequence of branching numbers with values in N, are assumed 

to be sampled conditionally on (see Section I .4 for a precise definition) and such 
that E > 0 a.s.; 


the sequence of stopping levels are with values in M, satisfy and 

are such that Z^'^'l is a stopping level with respect to (_ (see Definition 


zSK 


3.1). 


As explained above, once these three sets of random variables have been defined, the 
GAMS framework becomes a practical splitting algorithm which yields an nnbiased estimator 
of (22) (this is the claim of Theorem 4.1 proved in Section 0. 


Let ns emphasize that the reqnirement that Z^‘^) is a )zgK-stopping level is fnndamen- 
tal to obtain nnbiased estimators. It will be instrnmental to apply Doob’s optimal stopping 
Theorem for appropriate martingales in the proof of nnbiasedness. 

As a conseqnence of the measnrability property of (5^'^^)g>o, one easily gets the following 
property on Qiter: 


Proposition 3.4. The random variable Qiter is a stopping time with respect to the filtration 

Remark 3.5 (On the measnrability of the system of replicas with respect to (Ti'^^) 2 g]R). Let 
ns emphasize that for any > 0, the system of replicas is T'-measnrable bnt it 

is not measnrable with respect to (which indeed stores the information only np 

to the stopping level Z^'^)). 


3.2.3 The estimator 

For any integer q > 0 and any bonnded test fnnction (/? : "P —)• M, we define the estimator 


^('?)((^) = ^ 


(28) 


of 7r((/?). As it will be proven in Section any algorithm which enters into the GAMS 
framework is snch that 7r^'^\(p) is an nnbiased estimator of vr(<y9): for any g > 0, E = 

Moreover, nnder appropriate assnmptions (see Theorem 4.1), this statement can be 
generalized when q is replaced by the random nnmber of iterations Quer of the algorithm: 


E 




2g/(<5iter) 


The proof of this resnlt is given in Sections 4.3 and 4.4 and is based on martingale argnments. 
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3.3 The AMS algorithm enters into the GAMS framework 


In this section, we explain how the GAMS framework encompasses the AMS algorithm of 
Section We thus go back to the setting described there and prove that the modelling and 


algorithmic assumptions of sections 3.1 and 3.2 are satisfied in this case 


3.3.1 Modelling assumptions 

Let us first check that the so-called dynamical setting (namely the sampling of paths of Markov 
chains) that we considered in Section]^ for the AMS algorithm enters into the general setting 
of Section 13.11 

In Section]^ {V, B{V), vr) is the path space for Markov chains, endowed with the standard 
topology, as explained in Section 2.1 The filtration (filt 2 )^g]R on {V,B{V)) is defined by: for 
all z G M, filt^ is the smallest cr-field which makes the application x £ V ^ ^tAiT^ix)) ^ 
measurable: 

filG = cr{x ^ {xtA{T4x)))t>o) ■ (29) 

Finally, for any z G M and x £ V, the resampling kernel tTz{x, •) is defined by (|14[)- ([T^. 

Let us now check that Assumptions and are satisfied. The conditions of Assumption 
are direct consequences of the strong Markov property applied to the chain t >-£■ Xt £ S defined 
by 0 at the stopping time (the strong Markov property always holds true for discrete-time 
Markov processes). 

The right-continuity property of Assumption crucially relies on the definition (12) of 
T^z{x) as the entrance time of the path 1xt in the level set z, -t-oo [: the fact that Ja:, oo[ 
is an open set implies z e-)• T 2 (x) is right continuous. More precisely, we have the following 
Lemma. 


Lemma 3.6. Assumption^is satisfied for the resampling kernel defined by (14)-(15). More 
preeisely, for any x £ V, the resampling kernel z G M i--;- tTz^x, .) G Proba('P) is piecewise 
constant and right continuous. 

Proof. First, assume that T^(a:) = -|-oo, which means that H(x) < z. Then, for any e > 0 
we still have T^+e(x) = -|-oo. In that case TTzix, .) is a Dirac mass: 7r^(x, .) = Tr^+eix, ■) = 

‘^(3;tAT_4(a:))i>0 ' 

Now, assume that Tfix) < -|-oo. Then, for e g] 0, ^(xx^(a;)) — z[, T^lx) = T^+e(x), and by 
the definition of the resampling kernel, 7r^(x, .) = 7r^+£(x, .). □ 

3.3.2 Algorithmic assumptions 


As explained in Section 3.2 
GAMS framework 


to obtain a practical splitting algorithm which enters into the 
three procedures need to be made precise: the stopping criterion, the 
computation rule of the branching numbers and the computation of the stopping levels. These 
procedures should satisfy the measurability requirements of Assumption]^ 


The stopping criterion In the AMS algorithm, we set which is indeed 

a T'^-'^bixieasurable random variable, since Z(<}) is a (T'i'^^)^gR-stopping level, see Lemma 
below. 


3.7 
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The computation rule of the branching numbers The branching nnmbers 


are dehned in the splitting step (iv) of the AMS algorithm by (16), for n G -^on>z( 9 )' 


extend the dehnition for n G \ by simply setting = l. It is then 

easy to check that they satisfy the reqnirements of Assnmption Notice that in the AMS 
algorithm, the total nnmber of new replicas — 1,0} is given 

by = card/J^^(,). Moreover, all branching nnmbers are positive, so that = 0- 

Another particnlar featnre of the AMS algorithm is that the map takes valnes in the 

strict snbset of 

on,>Z^9; 

Let ns check that the compntation rnle ( |17[ ) for the weights in the AMS algorithm is indeed 
consistent with the formula ( |26[ ) given in the GAMS framework. First, for n G = 

^on<z(i) ^^oS’ = 1, = n and, consistently, . 

Second, for n G if is clear that E does not depend on n (since 

the random variables are exchangeable in n G addition, by construction. 


/(<?) 

” ^ on,>z(9) 


j^{n',q+l) _ 


n 


rep* 


Thus, we have by a simple counting argument: for any 


n G /' 


(q) 




]g^^(n,q+l)|jc-(g)^ = 


1 


card! 


(<?) 


on,>Z(‘i) , , 

on.yzW 




j(9) 


]E(BK.9+i)|jr(<?)) 

r(q) 

on.>Z^t^) 

Q{n',q+l)\jr{q) 


n 


rep 


card/ 


(<?) 


2 ,>zM 


n 


rep 


- iL(9+i) ■ 


Thus for n € I 


(q) 


2 ,>Z(l'> 


(and since = n) the formula 






rep 


in ( |17[ ) for the AMS algorithm is indeed consistent with the updating formula (26) for the 
weights in the GAMS framework. 

Third, for n G Inew^\ ^ Q{P^‘i+^'>{n),q) .^yiji(;.ij ig again 


consistent with the updating formula (26) for the weights in the GAMS framework since 

n^ p-K(i+ ^ _ 


Computation of the stopping levels Let us now check that the requirements on 
in Assumption 1^ are satished. By dehnition of (see the level computation step of the 

AMS algorithm), it is clear that (actually, the strict inequality > Z^^'> 


holds). It remains to prove that Z^'^) is a stopping level for the hltration (/Tz'^ )^gR. 

We start with an elementary result, which again highlights the importance of the strict 
inequality > z in the dehnitions (12) and (13) of Tz{x) and r^. 


Lemma 3.7. Let X : Q ^ V be a Markov ehain over the state space S (see Equation 
Then the random variable H(X) (where, we recall, the maximum level mapping S is defined 
by & is a z£S.-stopping level: for any 2 ; G M, {H(X) < zj G hlt^. 


Proof. On the one hand, we clearly have the equality of subsets of V: 


{x eV : S(x) < zj = {x e P : T^(x) = + 00 } . 
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On the other hand, = T^(X) is a filt^-measnrable random variable. The resnlt is then a 
conseqnence of these two facts. □ 

We are now in position to prove the last resnlts which is needed for Assnmption to hold. 

Lemma 3.8. For any q > 0, is a stopping level with respect to the filtration 
for any z G M, < 2} G 

Proof. Set by convention = —00 and let ns consider q > 0. Let ns introdnce the k-th 

order statistics over the maximnm levels at iteration q: Let ns 

also introdnce = max{H(X('^’'^'''^)) : n G By definition of (see the level 

compntation step of the AMS algorithm), 

Therefore, for any z G M, (nsing the partition n = < z} U > z}) 

<z} = <z}n 

= < 4 U < z < 

These events are all in the c-field a < 2 :} , < Z^'^'>} ,n ^ (in 

particnlar, the set of labels is measnrable with respect to To 

conclnde, note that by constrnction (level compntation step, (i)) and thanks to Lemma |3.7| 
for any 2 ; G M, 

(T ({h(X(”’''+i)) < 2} , |h(X("’''+^)) < Z('')} ,n G C C 

□ 


3.3.3 Almost sure mass conservation 


The classical AMS algorithm satisfies an additional nice property, namely it conserves almost 
snrely the mass in the following sense: 


Definition 3.9. A splitting algorithm which enters into the GAMS framework satisfies the 
almost sure mass conservation property if 


Vg > 0, ^ = 1 a.s. 


(30) 


Indeed, nsing the definition (0 of the weights and in particnlar the fact that all the 
weights (G'^"^’'^^)^ ^ ^(q) are the same: for any q > 0, 

on,z(^^ 


n'g/C^+l) 


y~^ ^(n',q+l) ^ Q(n',q+1) 


/^j(g+i) 



n' 

gj{9+l) 
t ion 


E 


Gin‘ 

\q) 

+ E 


on,< 2 ^ 9 ) 




on,>z(9) 

|jj(9 + l) 
'inew 

E 


Gin' 

\q) 

+ E 

Qin,q) 

on,<z(9) 




on,>z(5) 



^rep ^(P(i+l)(n),c}) 


n 


rep 


Q{n',q)_ 
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Thus, since = 1, by induction on q, (30) is satisfied. This property will be 

useful in Theorem |4.1 [ below: it is one of the two sufficient conditions to prove the unbiasedness 
of the estimator ip = being defined, we recall, by (28)). 

Notice that this property is not generally satisfied for any algorithm which enters into the 
GAMS framework. Actu ally, it is only true in general on average: by taking ip{x) = 1 and 

below, one indeed obtains that Vg > 0, E = 1. 


Qiter = Q hr Theorem 


4.1 


3.4 Reformulation of the AMS algorithm as a Sequential Monte-Carlo 
method 

The aim of this section is to make more explicit the link between the AMS algorithm and a 
Sequential Monte Carlo (SMC) sampler, for readers who are familiar with SMC methods. For 
those who are not, this section can be easily skipped. 

For a reaction coordinate with discrete values, the AMS algorithm presented in Section 
can be understood as a sequential importance sampling algorithm, where weights are assigned 
to replicas, and replicas are then duplicated and killed to compensate for these weights and 
obtain unbiased estimators (see for example [12] for a nice introduction to SMC methods 
and [7| for a discussion of the relationship between SMC algorithms and multilevel splitting 
algorithms). 

To highlight the similarity between the AMS algorithm and a SMC sampler, let us assume 
that the reaction coordinate takes values in the finite set {1, 2,..., Zmax} 

e:5^{l,2,...,z„,ax}. 

Let us now introduce a new way to label the successive iterations of the algorithm, by using 
the levels z G {1, 2,..., Zmax} rather than the iteration index q > 0. Notice indeed that for 
each z, there exists a unique iteration index g > 0 such that z G Let us then 

set: for all z G {1, 2,..., Zmax} and q such that G [Z^^~^\ 

'jW = /(-J), 

< G N), Vn G /('?), 

fp{n,z) ^ Q{n,q) ^ VnG /(''). 

The random variables jG\ and are thus respectively the new set of labels, the new 

system of replicas and the new system of weights, indexed by the levels z rather than the itera¬ 
tion index q. One can then check that the sequence of weighted replicas is 

obtained by applying a standard sequential Monte Carlo algorithm which iterates the following 
two steps: 

1. A splitting step, equivalent to the splitting step of Section [23] replicas that have reached 
the z-level set are split and weighted according to the splitting rule which conserves the 
total number of replicas above z. The weights of replicas that have not reached the 
^;-level set are not modified. 

2. A mutation step, where all replicas are resampled independently according to vr^, but 
with paths stopped at the stopping time T^+i (defined by ([l2|)). 
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In the SMC algorithm presented above, all the replicas are resampled which is not the 
case for the classical AMS algorithm. The following lemma is then crucial to reformulate the 
AMS algorithm as a S MC sampler. We recall that the resampling kernel TTz{x,dx') has been 

Moreover, the children-parent mapping has been extended to by 


2.4 


defined in Section _ 

setting = n for n G \ ■ 

Lemma 3.10. Consider the algorithm AMS introduced in Section \2.5\ Assume that in the 
resampling step, all replicas are resampled. More precisely, replace the two items (z) and (ii) 
in the resampling step by a single one: 

(i) For all n G Ion^^\ is sampled with the resampling kernel dx'). 

Then, the probability distribution of the algorithm is unchanged: the random variables 

have the same law for the modified algorithm as for the original 

one. 


4.3 


{ii)q and an induction argument on 


This lemma is easily checked using Proposition 
q>0- 

The discussion above thus shows that the AMS algorithm can be recast in the framework 
of sequential sampling. The interpretation of multilevel splitting methods as a sequential 
sampling method is not new (see e.g. |16|i. We refer to the classical monographs mm 
for respectively applications of Sequential Monte-Carlo methods in Bayesian statistics, and a 
comprehensive associated mathematical analysis. In particular, from the point view of im, 
the Adaptive Multilevel Splitting method for Markov chains (namely the dynamical setting) 
considered here can be interpreted as a time-dependent Feynman-Kac particle model with 
hard obstacles where: (i) the time index is given by the discrete levels z, (ii) the particles are 
paths of the Markov chain stopped at T^,, and (iii) the hard obstacle at level 2 corresponds with 
reaching A before the z-level set. Note however that strictly speaking, the version presented 
in the present section slightly differs from the classical presentation of Feynman-Kac particle 
models in m since all the replicas are used in the estimators, including those who have 
reached A. But this does not change the global picture. 

To conclude, let us recall that the construction of unbiased estimators for averages of the 
form is standard for SMC algorithms. In the SMC language, averages such as (|^ are 
called non-normalized averages, normalized averages being conditional expectations, namely 
ratios of two such averages. From this point of view, the unbiasedness result of the present 
work (see Theorem 4.1) is therefore not a surprise. Actually, another strategy of proof of 
Theorem 4.1 would be to rely on general unbiasedness results for SMC samplers, using the 
equivalence between AMS and SMC described above for discrete reaction coordinates, and 
then to extend the result to continuous reaction coordinates by considering a continuous limit 
of discrete levels. 


3.5 Examples of algorithmic variants 

In this section, we consider the setting and the AMS algorithm of Section and we propose 
variants which fit into the Generalized Adaptive Multilevel Splitting framework and may 


improve the efficiency of the algorithm in several directions (see Sections 3.5.1 3.5.2 and 3.5.3). 
In particular. Theorem |4.1| applies to the three examples detailed below. Moreover, we also 


illustrate the interest of the general setting we have introduced by providing in Section 3.5.4 
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an example which does not enter into the standard dynamical setting (sampling of paths of 
Markov chains) and for which the AMS algorithm could be used. 


3.5.1 Removing extinction 

We first introduce a variant of the AMS algorithm in the Markov chain setting (Section]^, 
which is designed in order to remove the possibility of equality of levels for two different 
replicas - this phenomenon is explained in Remark 2.4 for the AMS algorithm. This variant 
enters into the GAMS framework and thus leads to unbiased estimators. With this variant, 
exactly k replicas are resampled at each iterations. In particular, extinction of the system of 
replicas cannot occur. However, the algorithm requires the use of a rejection procedure for 
each resampling, which may slow down the simulation. Let us now describe this variant in 
detail. 

Let z G M be a level and x £ V. The definition (see & of T^z{x) remains the same, 
but the resampling kernel vr^ defined by (14)-(15) is modified as follows. Given x € V, the 


probability law Tr^ix, dx') is the distribution of a random variable Y £ V sampled as follows: 


• For t < T:z{x) -l,Yt = xt- 

• When Tj;(x) < +oo, is sampled using the transition kernel P(xx,,(x)-i) •) of 

the Markov chain, conditionally on ^(Fp^(a;)) > z. This can be done for example us¬ 
ing a rejection procedure: a sequence (T£)£eN* of i.i.d. random variables distributed 
according to P{x'y^{x)-i-: ■) is sampled, and one considers = Tl where L = 

inf{£GN* :e(Tf) >4- 

• For t > T^(x), the Markov transition kernel P is used to sample the end of the trajectory, 
up to the stopping time T^(F) where the path is stopped: 


Law(Ft|(W)o<.<t-i) = P{Yt-i, .). 

The definition of the filtration (filt^)^gR needs to be adapted in order to check Assump¬ 
tion 1^ The filtration filt^ is the smallest cr-field which makes the application x £ V ^ 
(T2(x), (xi/\(T,(x)-i))t>o) G N X P measurable: 

filG = a{x ^ {Tz{x), {xt/s,(T,ix)-i))t>o))- (31) 

Notice that we need T^(x) in addition to (2:t^(Xz(a;)-i))t>o since we need to know the time 
at which the chain reaches the level z. In order to check Assumption let us introduce 
the auxiliary Markov chain with values in 5 x {0,1}: Xt = (Xt, Then Assump¬ 

tion 1^ follows from the strong Markov property applied to Xt and the family of stopping 
times indexed by z defined by fz = inf {t > 0 : lg(Xt+i)>z ~ l}- Indeed, = filt^, and 
E(<^(X)|PVj=7r,((^)(A). 

With this modification of the classical AMS algorithm of Section]^ it is easy to check that 
the event that two replicas have the same maximum level is of probability zero, at least if the 
natural additional property is satisfied: if Fp and F 2 are generated according to P(a^T 2 ( 3 ;)-i) • )> 
where x G P is such that T 2 (x) < -|-oo, then P(^(Fi) = ^(F 2 )) = 0. This additional condition 
is satisfied in many practical cases, for example if the Markov Ghain is defined as the Euler- 
Maruyama discretization of a Langevin dynamics, see Q. 
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3.5.2 Randomized level computation 

To run the AMS algorithm of Section a sorting procedure of the replicas according to their 
maximum levels is required. More precisely, at the initialization step, all replicas must be 
sorted according to their maximum levels; at further iterations, the procedure is faster, since 
only the new replicas that have been resampled need to be sorted. 

It is possible to propose algorithms within the GAMS framework which never require 
the sorting of the entire system of replicas. The idea is to sample at iteration q a (small) 
random subset X('?+i) The level is then dehned as the fc-th order statistics 

of maximum levels computed only on the replicas with labels in Notice that such 

algorithms introduce some flexibility in the implementation of the level computation, which 
may be useful to design efficient parallelization strategies to speed up the computation. 

Notice that Assumption on the stopping-levels is then satished by slightly 

modifying the dehnition of the cr-helds indexed by z in the level computation step as follows: 

jr{q+i) ^ jr{q) V V mtf V cj(x(‘'+^)). 


3.5.3 Additional selection 

It is also possible to modify the branching rules so that larger branching numbers are affected to 
replicas which are in areas which have been identihed as important to get an accurate estimate 
of Tr{(p) (in the spirit of a sequential importance sampling algorithm). For instance, in the 
bi-channel case of Section |5.2[ it is possible to enforce a higher probability of branching for 
replicas which visit the channel which is not sampled sufficiently well. The only requirements 
to implement these strategies is that the branching numbers are dehned in such a way that 
Assumption is satished. 


3.5.4 Application to the sampling of a Gaussian bridge 

We presented above variants of the AMS algorithm. The GAMS framework also allows for 
different general setting: splitting algorithms can be used to sample other random variables 
than paths of Markov chains with levels dehned as sup{^(At^T-A)t>o} for some stopping time 
TA and some reaction coordinate function Actually, under appropriate assumptions, the 
following cases also enter into the setting of the GAMS framework: path-dependent reaction 
coordinates (duration of the path, integral over the path), sampling of continuous time stochas¬ 
tic processes (diffusions, jump processes, branching processes), sampling of non-homogeneous 
stochastic processes, etc... Let us discuss in this section as an example the sampling of a 
Gaussian bridge. 

Let K G N* be given, and consider the following Gaussian bridge distribution inV = 


7r{dxi... (Ixk) 




where > 0 is the appropriate normalization constant. This distribution is a discrete version 
of a Brownian Bridge, and can be interpreted as a Gaussian random walk (Ai, A 2 ,..., A^+i) 
starting from Xq = 0 and conditioned on {A^+i = 0}. 

The dehnition of the maximum level is H(x) = max{xj : i G {!,..., k}} and we wish to 
implement the AMS algorithm to compute small probabilities of the form P(H(A) > z) for 
some 2 ; > 0. 
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For this purpose, let us define T^z{x) = inf{f G {1,. .., k} : x* > z} G {1,..., k, + 00 } , and 
consider the filtration filt^ = , xx^(a;))- 

Let us now define the resampling kernels TTz{x,dx'). For a given x G assuming 
that "^z{x) < + 00 , let us introduce a random variable X' G such that X[ = Xi for 
i G {1,... ,T^(x)} and {X'^ , X'^) ~ 0) where for each m > 1, and 

xo,Xm+i G M, Bm{xo.,Xm+i) denotes the Gaussian bridge distribution 


B^{xo,Xm+l) = ^e-M(^0-xi)2 + (xi-X2)2+...+(x,„_i-x,„)2 + (x„-x^+l)2)^^^ _ _ _ 


We then define 

7r2(x, dx') = Law(X'). 

This general setting enters into the GAMS framework, and satisfies in particular Assump¬ 
tions and above. Assumption is a consequence of the following Lemma, applied to 
K = Tz{X) A (k — 1) (using the fact that (t(Ai, ... ,Xx^(x)) = hh^)- 

Lemma 3.11. Let {Xi,... ,Xk) ~ Bk(xo, Xk+i) and let K he a stopping time with respect to 
the natural filtration of (Ai,... ,X^) (i.e. for any I G {1, ... ,k}, {K < 1} G o'(Ai, . . . ,Xi)) 
such that K < K. Then, 


Law((Aii:+i,..., Ak)|(Ai, ..., Xk)) = Bk-k{Xk, Xk+i). 


Proof. First, the lemma is easily checked for K a deterministic integer, using the formula for 
conditional densities. Then the result is proven by conditioning on each value of K and using 
the fact that AT is a stopping time. □ 

This example can be generalized in various ways. First, it is possible to build resampling 
kernels Trz{x,dx') such that only the components x* such that Xi > z are resampled. Second, 
the same kind of algorithms can be applied to discrete Gaussian Markov random fields. 


4 The unbiasedness theorem 


In the present section, the unbiasedness of the empirical distribution over weighted replicas is 
proven. This is the content of Theorem 4.1 We first provide in Section 4.1 a summary of the 


notation used in the GAMS framework of Section The latter will be helpful to follow the 
statements and proofs of the present section. The main result is stated in Section [4.2| and the 
last two sections |4.3| and 4.4 are devoted to the proof of this result. 


4.1 Summary of GAMS notation 


We follow the algorithmic order of the GAMS framework of Section In the following, 
if : V ^ M. denotes a bounded test function. We also introduce below a new notation for an 
intermediate empirical distribution (see (32)). 


The initialization step (q = 0) The system of weighted replicas is denoted by = 
with uniform weights = 1/ card/^*^^ for n G . The first level is 

with the associated cr-field . 
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Iterations Iterate on g > 0 the following steps. 


The stopping criterion If the stopping criterion is satished, the algorithm stops at this 
stage, and we set q = Qiter- At the beginning of iteration g, the weighted empirical distribution 
estimator (dehned by (28)) is: 


n£lM 

The splitting (branching) step The random branching numbers are denoted by 
the updated set of labels = (/('?) \ U /new^\ the associated children-parent map 

plij+i) ; /(q+i) 7 ( 9 ) \ and the associated new weights (G*'"''’'^^^^)„/g 7 {q+i) • All the 

latter variables are sampled conditionally on and are jr(9) V ( 7 (^( 9 +!)) measurable. The 
weighted empirical distribution at this stage is denoted by 

^( 9 + 1 / 2 ) = Y (32) 

n'e/(9+l) 


The resampling step The system of weighted replicas after resampling is denoted by 


The level computation step The new level is denoted by the associated cr-held is 

dehned as 7 ^( 9 +!) = p( 9 ) v a{P^i+^^) V hlt^Jj+iJ. 


As already explained at the end of Section 3.2.1 we will use use in the following the whole 
sequence (A’('?^)q>o of weighted replicas as well as the whole sequence of related hltrations, 
which are simply obtained by considering the algorithm without stopping criterion. 


4.2 Statement of the main result 

The main theoretical result of this paper is the following. 

Theorem 4.1. Let {^^‘^'^)o<q<Qiter sequence of random systems of weighted replicas 

generated by an algorithm which enters into the GAMS framework of Section^ In particular, 
the Assumptions^ and^ on the general setting (see Section 3.1) as well as the Assumption^ 
on the stopping criterion, branching numbers and level computations (see Section are 

supposed to hold. 

Assume moreover that the number of iterations Qiter is almost surely finite (this condition 
writes P((5iter < + 00 ) = 1) and that one of the following conditions is satisfied: 

• Qiter is bounded from above by a deterministic constant. 


• or the almost sure mass conservation (30) is satisfied. 
Then, for any bounded measurable test function <y 9 : "P —)• M, 

]£ ^.^(Qiter)((^)^ = 7r(p). 


,£/{<?)) 
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Notice that a deterministic number of iterations Quer = ® £ N satisfy the assumption^ 
of Theorem 4.1 so that in the above setting (namely under Assumptions [T][2]j^ : 

Vgo > 0, E = 7r((^). 


As a corollary of Theorem 4.1 and thanks to the discussion in Section 3.3 which shows 
that the AMS algorithm of Section [2.5| enters into the GAMS fram ework, we also obtain that 
the AMS estimator (f = ((^) defined by (19) in Section 

7r{(p). 


2.6 


is an unbiased estimator of 


The strategy to prove this theorem is to introduce the sequence of random variables 




(33) 


for a fixed bounded measurable test function : "P —?■ M and to show that the process 
(M('?)((^))^gpj indexed by O' is a martingale with respect to the filtration Since, 

by Proposition |3.4[ Qiter is a stopping time for this filtration, Doob’s stopping theorem for 
discrete-time martingales can then be applied to obtain Theorem 4.1 The next two sections 
are devoted to the proof of Theorem |4.1| 


4.3 Proof of Theorem 14.11 

The following definition of conditionally independent replicas will be useful in the proof. 

Definition 4.2. Let Z be a random level, / C N* a finite random set of indiees and Q a 
a-field of events. We assume that cr{I) V cr{Z) C Q. We say that the random system of replicas 
is independently distributed with distribution {7 Tz{X^^\ ■))n&l conditionally on Q, 
if for any sequence of bounded measurable functions {(pn)n&l from V to M, we have 


E n ¥.n(xW)|g = Yl vrz(¥^n)(xW). 

\n£l / n£l 


Let US now state two intermediate propositions before proving Theorem 4.1 The first 
proposition states that, in the sense of Definition |4.2[ the set of replicas with indices in 
1(9) (resp. /(5+1)) are -conditionally independent (resp. V a ^p{g+i) ^-conditionally 

independent) with explicit distributions. 


Proposition 4.3. Let us consider the setting of Theorem 4-1 For any integer g > 0, 

{i)q The replicas are independent with distribution (^Tr^(q)(X^”'’'^\ •))„g/( 9 ) 

ditionally on . 

{ii)q The replicas 'g/( 9 +i) o-re independent conditionally on V a (pD+i))^ with 

distribution ^TT^(q)(X^^'''^~^^'l, . )^ ^ 

The second proposition states intermediate equalities between conditional averages of the 
empirical distributions, required to obtain the desired martingale property of (M^'^^((^)) 


and which are easily obtained from Proposition 4.3 


'g>0’ 


^To obtain 


in Qiter = (Jo, one simply has to choose 5^'^^ = < 1 ^ 9o 

(1 if g > go 
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Proposition 4.4. Let us consider the setting of Theorem 4-L For any integer q > 0 and for 
any bounded measurable test function (/? : P —)■ M, 

{iii)q E(7r('?+V2)(<^)|jr(<?)) = E 

{iv)g E (7r('?+i)((/?)|p(9) V = E (7r('?+V2)(<^)|jr(<?) y (t(P(9+i))). 


The proofs of both Proposition |4.3| and Proposition |4.4| are postponed to Section 4.4 We 


are now in position to prove Theorem 4.1 


Proof of Theorem \4-l\ The proof consists in first proving that defined by (33) 


is a (j-'^'^^)^^Q-martingale and then applying the Doob’s optional stopping theorem. 


Notice that E = E(7r('?+^)((^)|T'('J)) and let us compute the right-hand 

side. First, from point {iv)q of Proposition 4.4 and since C V we get 




Second, from point {iii)q of Proposition 4.4 we have 


We thus have for any q > 0, 


E ( 


= E ( TT 


(34) 


and is therefore a (p('^))^gpj-martingale. 

We now focus on stopping the latter martingale at the random index Qiter- By assumption, 
either the almost sure mass conservation property (30) is satisfied, in which case 

is a bounded martingale (since |m('^)((/?)| < ||(^||oo), or Qiter < 9max for some deterministic 
real number ^max £ 1^- In both cases, we apply the Doob’s optional stopping Theorem (see 
for instance 1211 . Chapter 7, Section 2, Theorem 1 and Corollaries 1 and 2) to the martingale 


((/?))and with the stopping time Qiter with respect to the filtration We 


obtain 


qSN' 




which concludes the proof of Theorem 4.1 


□ 


4.4 Proofs of Propositions 4.3 and 4.4 


Proposition |4.3| requires an additional intermediate result, namely the propagation Lemma 4.5 
below. This lemma gives rigorous conditions under which the property on a system of replicas 
of being independently distributed with distribution {Trz{X^'^\ . ))n&l conditionally 
on X can be transported from the (j-field T" to a larger d-field. It is based on Doob’s optional 
stopping theorem for martingales indexed by the level variable Notice that it is the only 
result where the right continuity property of Assumption is explicitly used. 
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Lemma 4.5. Let us assume that Assumptions^ and^hold. Let Z G M U {—oo,+oo} be a 
random level, Q a a-field, and / C N* a finite random set of labels. Assume that a{I)\/ a{Z) C 
Q. Consider a random system of replicas At = which is independent ly di stributed 

with distribution {7 Tz{X^^\ . ))ne/ conditionally on Q (in the sense of Definition 4-2)- Set 


yz G R, Gz = G flit 


A" 


(35) 


and assume that Z' gRC {— oo, +00} is a stopping level for the filtration (^Gz)z£M. that, 
almost surely, Z' > Z. 

Then the replicas n^l are independently distributed conditionally on Gz'i with dis¬ 

tribution {Trz'{X^^\ ■))n&r 

Proof of Lemma \4.5[ Step 1. The first step consists in proving that for any fixed 2 ; G M, the 
system of replicas is independently distributed with distribution z\j •))„£/ condi¬ 
tionally on G V filt^. By a standard monotone class argument, it is sufficient to show that 

Em = E ( n ^ZWz{Tn){X^"Mn{X^"^) Y 

\n£l ) \n£l 



where ((pn}n>i ranges over bounded measurable test functions from V to M, (V'n)n>i ranges 
over filt^-measurable test functions from P to M, and Y over bounded ^-measurable random 
variables. Let us denote X = E T) the left-hand side. Since Y is 

^-measurable, by Definition |4.2| of the conditional independence we get that 

I = E (l[7TziTnf^n){X^"^)Y] . 

Vne/ / 


The functions (ipn)n>i being filt^-measurable, they are a fortiori filt^vz'-measurable for any 
z' G M. Assumption 1^ on the resampling family then yields 

'n:z’{Tn'fin){x) = TTz'{Trz'Vz{'GnTn)){x) = z'z'y z{Tn)){x) ■ 


As a consequence, using again that the system of replicas is independently dis¬ 
tributed with distribution (^7rz{X^'^'l, conditionally on G and that Y is ^-measurable, 

we get the following identity 


X = E 


n {^PnTTZyziTn)) {X^"^)Y =^11 ^ZVz{Tn){X^"^)MX^^^)Y 


\n£l 


\n£l 


and this concludes the first step. 


Step 2. We now prove the main claim of this lemma, namely the fact that the replicas 

are independent with distribution (^7rz'{X^^\ . ))^^j conditionally on Gz'- Let us first assume 

that the test functions {(pn)n£i ure continuous from V to R. 

In order to come back to a classical setting to apply Doob’s optional stopping theorem, let 
us introduce a continuous, one-to-one and strictly increasing change of level parametrization 
Z : [0,1] —)> M U {—oo,-|-(X)}. Let us consider the following stochastic process indexed by 
t G [0,1]: 


= E 
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It is a bounded (since I is ^^(j^-measurable for all t) and thus uniformly integrable martingale 
with respect to the filtration {Qz{t))te.[o,i\- addition, A^i = E (](([^ where 

G+cx) = Q\l filt(^gQ. Thanks to Step 1 above, we get: almost surely, for all t G [0,1], 

n£l 


Therefore, Nt is almost surely a right-continuous bounded martingale from Assumption 
on (7r2)^g]R. By assumption, T' = Z~^{Z') is a ^j-stopping level, and we can 

use a Doob’s optional stopping argument for right continuous bounded martingales (see for 
instance m Theorem 3.22]) to get 


E {Ni\G2:{t')) — ^T' 


which can be rewritten as (since Z' > Z a.s.) 


E 


\{iPn{X^^^)\Gz' 

.n£l 


n£l 


This equality actually holds for any sequence of bounded measurable functions ((pn)nei since 
continuous bounded functions are separating. This concludes the proof of Lemma |4.5[ 


□ 


Thanks to Lemma 4.5 we can now prove Proposition 4.3 


Proof of Proposition \4-^ We proceed by induction on the iteration index g > 0. We first 
prove directly the statement {i)q ^ {ii)q and then {ii)q =► (*)(}+i using Lemma 4.5 The 


initialization step consists in proving (i)o using Lemma 4.5 In this proof, {'~Pn)n>i denotes a 
sequence of bounded measurable test functions from V to M. 

Proof of {i)o. The statement (i)o reads 


E 


n •pn (x<”.»)) i^i»)) = n 

^eP^'> / n£lW 


7r^(o)(¥.„)(X(-’0)), 


4.5 


where = filt^[o). This is exactly the result of Lemma 
G = it(/^°^), and recalling that the replicas are initially indepenc 
to vr. 


taking Z = —oo, Z' = Z^^\ 
ent and distributed according 


Proof of (i)q => {n)q. Assume that {i)q holds. We rewrite property {ii)q as follows 


E 


Pn’ 

n'e/i'j+i) 


1^(9) Va(p('?+1)) 


n'e/( 9 +i) 


(36) 


and we now prove this identity. 
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Let us recall that in the resampling step, the replicas with labels in are not resampled 
and the replicas , (q+i) are sampled in such a way that they are independently dis- 

^ t-^new 

tributed with distribution (7r2(q) .)) , Aq+i) conditionally on V 

^ fe-^new _ 

Therefore, by definition of the total set of labels given in (25), one obtains 


E 




^n'e/(9+l) 

= E 




( 





It c.inew 






Next, from the induction hypothesis {i)q, the replicas are independent with dis¬ 
tribution . ))^^j(q) conditionally on Since is sampled conditionally 

on F^‘^\ the replicas {X^"‘’'^'))^^j(q) are also independent conditionally on \/, with 
the same distributions. Therefore (notice that and are X^'^)V(T(P^'^'’'^^)-measurable) 


/ 


E 


n ‘Pn |X(^) V u(p('^+l)) = n 

n \.'killed 

Gathering the results leads to 


\n'el(i+P / n'el(i+P 

From the resampling step and Assumption]^ the following identity holds: 

V(Z>0,Vn'G/(‘^+i), 7r^(,)(X(-'’''+i), .)= 7r^(,).). (37) 


This concludes the proof of ( |36| ). 

Proof of {n)q ^ (Oq+i- Let us now assume that {ii)q holds. To prove that {i)q+i holds, it is 
sufficient to check that 


E 


n 

ne/( 5 +l) 


ifn |jr(-?+l) j = 


n vr^(,+i)(7^n)(X("’'?+L). 

,lg/{9 + l) 


This is again exactly the result of Lemma 4.5 applied to taking Z = Z^'^\ Z’ 


Zii+^) and Q = F^^'^ V a{P^'^~^^^) so that Qz = (where, we recall, Fz‘^~^^'‘ is defined 

Equation (j^). 


by 

□ 
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Finally, let us prove Proposition |4.4[ 


4.3 


Proof of Proposition \4-4\ The first equality {iii)q is a direct consequence of the dehnition of the 
branching numbers. The second equality {iv)q is obtained as a consequence of Proposition 
by combining {i)q and {ii)q. 

Proof of {iii)q. The proof of this assertion is a direct application of the branching rule. Indeed, 
by dehnition of the weights ’9+^) given in (26), by dehnition of the branching numbers 


as the number of offsprings of the n-th replica, and because these branching 
numbers are independent of conditionally on , we get 





Q{p(i+^'l(n'),q) 








= E (^7f('?)(</?) I 


Proof of {i)q + {ii)q {iv)q- Using successively {i)q, the identity (37) and (u)q, we have: 


= E (7r(9+U((^)|^(-?) V , 


□ 


5 Numerical illustration 


The aim of this section is to illustrate the behavior of the AMS algorithm as dehned in 
Section in various situations involving discrete-time approximations (Q of the overdamped 
Langevin dynamics in dimension 1 and 2. 

We would like to discuss in particular the unbiasedness of the AMS estimator p of p = 


¥{tb < ta) (see the formula (20)) whatever the choice of the reaction coordinate the number 
of replicas n^ep and the minimal number k of replicas which are declared retired and resampled 


at each iteration of the AMS algorithm. Indeed, from Theorem 4.1 we know that 


V f, Urep, k, E (p) = p = F{tb < Ta). 

In the following, {pm)i^m^N refers to independent realizations of the estimator p obtained 
by N independent runs of the algorithm and the associated empirical mean is denoted by 


1 ^ 

Pn = ^Y.P-^- 


( 38 ) 


m=l 
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The variance of the estimator p is also investigated numerically, and it is shown in some 
two-dimensional situations that the variance heavily depends on the choice of the reaction 
coordinate. In the following, we will denote by 




1 ^ 

m=l 


- {PN^ 


(39) 


the size of the 95% empirical confidence interval computed using the empirical variance ob¬ 
tained over N independent runs of the algorithm. 


The section is organized as follows. In Section 5.1 we illustrate on a simple one-dimensional 
test case the importance of a proper implementation of the branching and splitting steps in 
order to obtain an unbiased estimator of (1^. Then, in Sections 5.2 and 5.3 we give two 


examples in dimension 2 on which we discuss the efficiency of the AMS algorithm by studying 
how the convergence of the estimator depends on the parameters Urep and k. Finally, in 


Section 5.4, we draw some conclusions and practical recommendations from these numerical 
experiments. 


5.1 One-dimensional example: Brownian-drift dynamics 

Let us first consider a one-dimensional example, with a reaction coordinate ^ : M —)• M which 
is an increasing function. Of course, in this situation, the AMS algorithm does not depend 
on The aim is thus here to show the unbiasedness of the estimator whatever Urep and k. 
Moreover, we would like to illustrate the fact that incorrect implementations of the branching 
and splitting steps may lead to strongly biased results. 

The model Let G M be a drifted Brownian motion, starting at xq, with drift 

—fj, < 0 and inverse temperature /3: for any t > 0, Xt = xq— iJ,t+ where {Wt)t^Q is 

a standard Brownian motion. We use the explicit Euler-Maruyama method with a time-step 
size At > 0: for any z G N 

Xi+i = Xi — fiAt + \/ 2j3~^At Gi, Xq = xq, 

where the random variables (Gj)jgN ^-re independent standard Gaussian random variables. 
Given a < xq < b, we consider the estimation of 

p = F{Tb < To) 

where r;, G N and Ta G N are the first hitting times of ]6, -|-oo[ and ] — oo, a[ respectively. 

In the sequel, we choose the initial condition xq = 1, as well as the two barriers a = 0.1 and 
b = ^max = 1-9. We choose /z = 1 and we use the following values for the inverse temperature 
(5 G {8, 24} in order to have a range of estimated probabilities over several orders of magnitude. 
Moreover the time-step size is At = 0.1. The number of independent runs is A = 6.10®. 


Biased algorithms In order to highlight the importance of a proper implementation of the 
splitting and resampling steps when many replicas have as a maximum level, we perform 
tests with slightly modihed versions of the AMS algorithm, which happen to yield biased 
estimators. Two biased versions are considered. 
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• Version 1: We first consider the algorithm where the number of resampled replicas is 
exactly k even if /fl'J+i) > namely if more than k replicas have a maximum level 
smaller than the current level of the algorithm. 

• Version 2: We then consider a situation where the replicas are possibly resampled from 
a state with a reaction coordinate equal to the current level 


More precisely, the first version (Version 1) of a biased algorithm is obtained by modifying 
the AMS algorithm of Section [2.5| as follows: 


(i) At iteration q, the working replicas are ordered according to their maximum level (possi¬ 
bly with equalities) and the k working replicas with smallest maximum level are declared 
retired, and the others remain working replicas. The level Z^^'> is still defined as the max¬ 
imum level of the /c-th newly retired replica. As explained in Remark 2.4, some of the 
remaining working replicas may have their maximum levels equal to Z^. Then, as in 
the classical AMS algorithm, k new replicas are resampled by picking (randomly) an 
initial condition among the current working replicas with maximum level strictly larger 
than (namely with labels in Replicas with maximum equal to Z^^^ are 

not split. The resampling kernel is the same as in the AMS algorithm, see Section |2.4[ 


(ii) At the end of the algorithm, the probability p is estimated by 


rn _ U \ Qiter 

' 'rep \ 


/ 


n 


rep 


1 V- 

— E 


. . Ts(X("’'3iter))<TA(X(’"’«iter)) | ) (^0) 


which is consistent with the general updating formula (26) for the weights in the GAMS 
framework. 


The second version (Version 2) of a biased algorithm is obtained by modifying the AMS 
algorithm of Section 2.5 exactly as for Version 1, except for the resampling step. Item (i) is 
modified as: 


(i-bis) At iteration g, the k working replicas with smallest maximum are declared “retired”. 
Then, k new replicas are resampled by picking (randomly) an initial condition among 
the states of all the current replicas with maximum levels larger than Z^‘^\ including 
those with maximum levels equal to Z^'^\ The resampling kernel (defined by (|14[)- 
©) is modified accordingly: the parent trajectory x is copied up to the time Tz{x) = 
inf {t > 0 : i{xt) > z} (with a large inequality) instead of T 2 (a:) (which involves a strict 


inequality, see (12)) and then completed using the Markov dynamics. 


These biased versions do not enter into the GAMS framework. Note that for both modi¬ 
fied versions and contrary to the classical AMS algorithm, the sequence of levels Z^^'l is not 
necessarily strictly increasing. These modifications will increase the number of iterations of 


the algorithm, which in view of (40) explains the bias towards lower probability (see results 
below). 
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Results In Table and Table estimations with the AMS algorithm are given. We observe 
that the estimated probability is stable under changes of nrep and k, with a small conhdence 
interval. This yields the reference values p = 3.6010“'^ for /? = 8, and p = 1.210“^*^ for /? = 24. 
For /3 = 8, we have checked that these results are in agreement with those obtained by a 
standard direct Monte-Carlo estimation with 6.10® realizations. 

In Table (where e-n stands for 10“”), estimations with the two biased versions of the 
AMS algorithm with k = 1 are given. Even for (3 = 8 (for which the target probability is p = 
3.6010“'^^), the bias is non-negligible. For (3 = 24, the probability can be underestimated by a 
multiplicative factor 100 with Version 1. The bias induced by using incorrect implementations 
of the AMS algorithm can thus be very large. 


Notice that we have chosen a relatively large timestep (At = 0.1) in order to easily extract 
the bias introduced by inappropriate modihcations of the AMS algorithm. When the timestep 
size At goes to 0, the two variants we discussed above get close to the classical AMS algorithm 
and the bias disappears, since t he p robability to observe two replicas with the same maximum 
level goes to zero (see Remark 2.4) and the probability that Tz(x) is different of T^(x) for a 
parent trajectory also goes to zero. Finally, we recall that an unbiased variant where exactly 


k replicas are resampled at each iteration has been presented in Section 3.5.1 


^rep 

1 I 

50 

50 

50 1 

100 

200 

k 

1 

1 

10 

20 

1 

1 

Pn 

3.60e-4 

3.596e-4 

3.596e-4 

3.597e-4 

3.597e-4 

3.597e-4 

6n 

1 O.Ole-4 1 

0.004e-4 

0.004e-4 

0.006e-4 1 

0.003e-4 

0.002e-4 


Table 1: Results obtained on the Id test case with the classical AMS algorithm and (3 = 8. 


^rep [ 

10 

50 

50 

50 

100 1 

200 

k 

1 

1 

10 

20 

1 

1 

Pn 

1.20e-10 

1.21e-10 

1.21e-10 

1.21e-10 

1.205e-10 

1.203e-10 

1 

0.3e-10 

0.03e-10 

0.03e-10 

0.03e-10 

O.Ole-10 1 

0.005e-10 


Table 2: Results obtained on the Id test case with the classical AMS algorithm and (3 = 24. 


Version 

1 

2 

2 

1 

2 

2 

^rep 

100 

100 

10 

100 

100 

10 

13 

8 

8 

8 

24 

24 

24 

P 

3.60e-4 

3.60e-4 

3.60e-4 

1.2e-10 

1.2e-10 

1.2e-10 

Pn 

1.74e-4 

3.257e-4 

2.96e-4 

1.40e-12 

6.05e-ll 

5e-ll 

5n 

0.03e-4 

0.003e-4 

O.Ole-4 

O.le-12 

0.07e-ll 

1.5e-ll 


Table 3: Results obtained on the Id test case with biased versions of AMS. 
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5.2 The first two-dimensional example: the bi-channel problem 

The aim of this two-dimensional example is to investigate the importance of the choice of the 
reaction coordinate on the efficiency of the algorithm, on a typical example which has been 
used in previous numerical studies, see |10[ [THl 119] , 


5.2.1 The model 

We consider the following two-dimensional overdamped Langevin dynamics: 

dX{s) = -VS{X{s))ds + ^2p-^dWs, (41) 


where IT is a two-dimensional Wiener process and /3 > 0 is the inverse temperature. 

We use again the Euler-Maruyama method for the time-discretization of the process X. 
The time step is denoted by At. For our numerical simulations we take At = 0.05. For an 
initial condition Xq = xq G and for m G N, the numerical scheme reads 

Xm+l = Xm — X.tX£{Xm) + \/2/3“l (lT(m+l)At “ WmAt), 


where —ITmAt = V^^Gm and (Gm)neN are independent Gaussian random variables 

in with zero mean and covariance Id. 

In the simulations below, the initial condition is Xo = xq = (—0.9,0). The potential 
T : —)> M is given by 


£{x,y) = + 0.2 ( y - I-] +3e 


_ x^-(^y-lY _ (x-lf-y^ _ . (x+l)2- 


This potential is plotted on Figure This potential has two global minima connected 
one to another by two channels: the upper channel (which goes through the shallow minimum 
around (0,1.5)) and the lower channel (which goes through the saddle point around (0, —0.5)). 
The two global minima are close to niA = {xA,yA) = (~1)0) and uib = {xB,yB) = (IjO). 
For some p g] 0, 1[, we consider the sets A and B dehned as the Euclidean open balls of radius 
p around the two minima rriA and tub, namely 


A = B{mA, /o) = I {x, y) G : y/ix- xa)'^ + {v - VaY < p| 

B = B{mB,p) = |(x,y) G {x - xbY + {y - ysf < /o| • 


In the numerical applications, we take p = 0.05. Most of the trajectories starting from xq 
hit A before B. Moreover, A and B are metastable states: in the small temperature regime, 
starting from A (resp. B), it takes a lot of time to leave A (resp. B). 

We are interested in the estimation of the probability p = P(ts < ta), where the hrst 
hitting times ta and tb are dehned by 


ta = inf {m G N : X^ G A} and tb = inf {m G N : Xm G B} . 

We will consider the results of the AMS algorithm for the three reaction coordinates 
with i G {1, 2, 3}: 

1. the norm to the initial point tua: C^{x,y) = y^(x — xa)'^ + (y ~ Va)'^, 
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Figure 3: Plot of the potential function for the bi-channel problem. 


2. the norm to the final point ms: y) = Vb) — \/{x — xbY + {y — VbY-, 

3. the abscissa: ^^{x,y) = x. 


The maximum levels used in the stopping criterion of the algorithm are = 1-9 

and = 0'9- Notice that for i G {1, 2, 3}, we have B C (^*)~^(]-Zmax) +oo[) ©)■ 

In Section 5.2, we take k = 1 and the number of replicas is n^ep = 100. The values of fi be¬ 
long to the set {8.67, 9.33,10} which are associated with probabilities p ranging approximately 
from 2.10“® to 1.10“^®. 


5.2.2 Evolution of the empirical mean 


Let us first perform simulations with N independent runs of the algorithm, N varying between 
1 and 6.10®. We represent on Figurej^the evolution as a function of N of the empirical mean 
PjY (defined by (38)) and of the associated 95% confidence intervals — S]\f/2,Pj^ + (5Ar/2] 
computed using the empirical variance, see (39). 

The colors in the figures are as follows: green (solid line) for red (line with crosses) 
for and blue (line with circles) for The full lines represent the evolution of the upper 
and lower bounds of the confidence intervals, while dotted lines represent the evolution of the 
empirical means. 

From these simulations, we observe that: 


• When N is sufficiently large, the confidence intervals overlap. This is in agreement 
with the fact that p is an unbiased estimator of p whatever the choice of the reaction 
coordinate. 


• The statistical fluctuations depend a lot on the reaction coordinate. In particular, the 
results obtained with seem much better than with or We will come back to 
this in Section [5.2.41 

• The confidence interval being computed empirically, one may conclude that the algo¬ 
rithm is biased by considering the results for N too small (see for example the graphs 
in the right column in Figure Q. This is due to the fact that the empirical variance 


41 














6 


X 10 ® 



3 

2 ^ — 

0 Abscissa 
—N— Norm to final point 

Norm to initial point j 






. 

“ 




xIO''® 





Figure 4: Evolution as a function of N of the empirical mean and of the associated 95% 
confidence intervals — 5]yl^,Pn + ^n/^]- Upper to lower f3 = 8.67,9.33,10. The right 
inserts are zooms of the left graphs on smaller values of N, in order to illustrate the “apparent 
bias” phenomenon. 
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dramatically underestimates the real variance if N is too small. This is a well-known 
phenomenon for splitting algorithms in general called “apparent bias”, see |14| . As /3 
gets larger (namely as the temperature gets smaller), the number of independent runs 
N required to observe overlapping conhdence intervals gets larger. For example, for 
= 13, and N = 6.10®, we were not able to get overlapping conhdence intervals for 
the three reaction coordinates: this is actually an indication of the fact that one should 
pursue computations with larger N in order to get reliable estimates. 

We observe that there are some realizations for which the estimator of the probability 
is very large. These realizations have small probability but they dramatically increase the 
value of the empirical mean and of the empirical variance. This explains the jumps which 
are observed on the empirical average and conhdence interval as a function of the number of 
realizations, see Figure]^ In the next section, we illustrate this aspect. As is usually the case 
with Monte Carlo simulations for rare event simulations, it is impossible to decide a priori if 
the sample size N is sufficiently large to give an accurate estimation. 


5.2.3 Heavy tails 

In this section, we give a more quantitative interpretation of the above observations on the 
evolution of the empirical mean. For a given inverse temperature /3, and a given reaction 
coordinate, we sample N independent realizations of the algorithm denoted by {Pm)Km<N- 
To illuminate the importance of the largest values of the estimator in the empirical average, 
we compute partial empirical averages over the largest values or over the smallest values among 
the N realizations. More precisely, for a hxed Nq, we compute 


Vo,large 

Pn 


1 

Wo 


1 —A^o,small 

Pie) and 


e=N-No+l 


1 

N-No 


N-No 

W)' 


e=i 


where denotes the order statistics of (Pm)i<m<v ^n other words, is the 

average over the No largest realizations of pm, while jg the average over all the other 

values. In particular, 

_ ^0 Vo,large /, ^0 \ Vo,small 

Pn - -^Pn 

In Table 1^ we use N = 6.10®, and Nq = 100 or No = 1000. The results are obtained with 
the same realizations as those used in the previous sections. We observe that the largest values 
contribute a lot to the empirical average over all the runs whatever the reaction coordinate. 
This is characteristic of a heavy tailed distribution. Indeed, for /3 = 10 for example, the value 
of differs a lot from the value of p^. In addition, notice that this difference is more 

important when using and while when using This is in agreement with the fact that 
we observed better results (in terms of statistical fluctuations) with than with and in 
Figure]^ Indeed, distributions with heavier tails typically lead to larger fluctuations. 

Remark 5.1 (On heavy tails in the idealized setting). Let us mention another setting where 
it can be analytically checked that the distribution of the estimator p has heavy tails in some 
regime. In a one-dimensional case (or when the committor function is used as a reaction 
coordinate), one can show that the AMS algorithm can be related to the so-called exponential 
case, namely when H(X) is distributed with exponential law with parameter one (see for 
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p 

Pn 

No 

Afo,large 

Pn 

—A^o,small 

Pn 

No 

Afo,large 

Pn 

—A^Oi small 

Pn 

8.67 

1.7e-09 

1000 

1.3e-e-06 

1.5e-09 

100 

4.0e-06 

1.6e-09 


8.67 

1.3e-09 

1000 

5.7e-06 

4.2e-10 

100 

5.2e-05 

5.1e-10 


8.67 

1.8e-09 

1000 

8.0e-06 

4.8e-10 

100 

6.5e-05 

7.3e-10 


9.33 

3.2e-10 

1000 

3.0e-07 

2.7e-10 

100 

l.Oe-06 

3.0e-10 


9.33 

3.8e-10 

1000 

1.9e-06 

6.4e-ll 

100 

1.8e-05 

8.2e-ll 


9.33 

5.2e-10 

1000 

2.7e-06 

7.2e-ll 

100 

2.5e-05 

l.Oe-10 


10 

6.2e-ll 

1000 

8.9e-08 

4.8e-ll 

100 

4.0e-07 

5.6e-ll 


10 

1.3e-10 

1000 

7.0e-07 

9.1e-12 

100 

7.2e-06 

1.3e-ll 


10 

6.5e-ll 

1000 

3.0e-07 

l.Oe-11 

100 

3.0e-06 

1.5e-ll 


Table 4: Comparisons of the empirical averages with partial empirical averages obtained 
over the Nq largest and N — Nq smallest values among N realizations. 


example M)- It is then easy to show that for k = 1, and in the regime where n^ep +oo 
and the target probability is p = exp(—cr^nrep) for some fixed cr > 0, then 2 converges in 
distribution to the log-normal distribution exjp{aZ — (2) where Z ~ AA(0,1) is a standard 

Gaussian random variable (see [2l Proposition 3.4]). In particular, p has heavy tails (the 
median exp(— (T^/2) of this distribution is smaller than its expectation 1, so that for a large a 
the empirical mean under-estimates the expectation). 


5.2.4 Fluctuations induced by the two channels 

In this section, we compare the results when using two reaction coordinates: (norm to niA) 

and (abscissa). Since the typical behavior we observe in Figure]^ and in Table is the 
same for and we do not repeat the analysis for 

As explained above, there are two possible channels for the reactive trajectories going from 
Alo B: the upper channel and the lower channel. For each realization m, one can distinguish 
the contributions to the estimator pm of the replicas going through the upper channel and 
the ones going through the lower channel. In the following, for a given path, the trajectory is 
associated to the upper (resp. lower) channel if the first hitting point of the y-axis is such that 
y > 0.5 (resp. such that y < 0.5). More precisely, let us define ni(x,y) = x and Ii 2 {x,y) = y 
for any {x, y) G M?. For a replica X = {Xt)t£n such that r = inf {f G N : IIi (Xt) > 0} < oo, 
X G Upper if 112 (^r) > 0.5 and X G Lower if 112 (^r) < 0.5. 

For each run of the algorithm, we compute the three following quantities: 

• the number of replicas which reach B before A: 


M" = 


E 


^iter) ^ 


• the number of replicas which reach B before A and go through the upper channel: 


jyjS,upper 


E 


^Ts(X("’'5iter))<TA(X("’'3iter))lx(""<3iter)eUpper 
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• the number of replicas which reach B before A and go through the lower channel: 


M 


S,lower _ 


E 


(X("''3iter) )<Ta (X("’'3iter) ) SLower 


nG/( 


(Qiter) 


Notice that and that / 0 is equivalent to p / 0. When needed, 

we explicitly indicate the dependence of these quantities on the realization by a lowerscript 
m: for m G |1, ■ • • ,-N}, we thus denote M^, M^’'^pp®’' and the m-th realization of 

M^, M^’^PP®'' and 

Let us introduce the set £n = '■ Pm 7^ 0} of realizations which lead to a non zero p and 

the proportion i?jv = /N of such realizations. We now divide the realizations in 8^ 

into three disjoint subsets, with associated proportions. 

• All replicas reaching B before A go through the upper channel: 

^-PP- = \m€£N- = o| and p"PP'^'^ = . 

I J cardcv 


• All replicas reaching B before A go through the lower channel: 


= {m££N- M^’^PP"’’ = 0} and 


cardf]^"'^" 
card Sn 


• Both channels are used by the replicas reaching B before A 

= £n \ (sT" U and = 


card 
card £n 


Obviously, /9™“ = 1 — Finally, we dehne conditional estimators for p associated 

with the partition of £isf dehned above: 


-upper ^ Emgg-PP-Pm 

card£:;(fP"'^ ’ 


/ lower Pm . mix Pm 

^rrt^c^ and 

card£:]^'^®" cardf™^ 


Notice that 

Pj, = Rn + pT^^pT^’^ + pTpT) • 

In other words, we have separated the non-zero contributions to pj\f into (i) realizations for 
which all the replicas go through the upper channel (hrst term in the parenthesis), (ii) realiza¬ 
tions for which all the replicas go through the lower channel (second term in the parenthesis), 
and hnally (iii) realizations for which the two channels are used by the replicas (third term in 
the parenthesis). 

Contrary to pjv, the limit when ^ oo of the estimators Rn, p7^^ 

or (for a given value of Urep) depends on the choice of the reaction coordinate 

see Remark 15.51 below. 

From Table 1^ we observe that for approximately half of the realizations use exclusively 
the upper channel and the other half use the lower channel. The associated conditional 
estimators and are very close. This is not the case for only very few realizations 

go through the upper channel while the associated probability is much larger than the 

two other ones This means that a few realizations contribute a lot to the 

empirical average This explains the very large conhdence intervals observed with (in 
comparison with those observed for ^^) on Figure]^ 
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/3 

N 

Rn I 

upper 

Pn 

„mix 

Pn 

slower 

Pn 

.^rupper 

Pn 

-mix 

Pn 

-lower 

Pn 

Pn 


8.67 

2.10^ 

0.81 

0.45 

0.03 

0.52 

2.7e-09 

3.0e-09 

2.3e-09 

1.7e-09 


8.67 

2.10® 

0.99 

0.0008 

0.02 

0.98 

2.3e-06 

5.9e-10 

5.5e-10 

2.4e-09 


9.33 

4.10® 

0.72 

0.51 

0.02 

0.47 

6.2e-10 

6.3e-10 

2.5e-10 

3.2e-10 


9.33 

4.10® 

0.97 

0.0005 

0.02 

0.98 

l.Oe-06 

5.6e-ll 

9.7e-ll 

6.0e-10 


10 

6.10® 

0.62 

0.51 

0.01 

0.48 

1.5e-10 

1.4e-10 

5.2e-ll 

6.2e-ll 


10 

6.10® 

0.92 

0.0004 

0.01 

0.99 

1.4e-07 

1.5e-ll 

1.8e-ll 

6.8e-ll 


Table 5: The bi-channel case. Proportion and conditional probabilities for two reaction coor¬ 
dinates: the norm to the initial point and the abscissa (^^). 


Remark 5.2 (On /9™“). We observe that for both reaction coordinates the valne of is 
very small: on most realizations, if at least one replica reaches B before A then all replicas 
reaching B go throngh the same channel. This effect is dne to the rather small nnmber of 
replicas (namely nrep = 100). When nrep increases, we observe both npper and lower paths 
on each realization, see for instance |10) for snch experiments. 

Remark 5.3 (On the efficiency of We see in Table ^ that is smaller for the best 
reaction coordinate (in terms of flnctnations) than for Indeed, the potential V admits 
a local minimnm at x* ~ (0,1.54) with ^^(x*) ~ 1.83, which is very close to 
Notice that the inflnence of the local minimnm is mnch weaker when nsing the abscissa since 
^^(x*) = 0 which is far from z^ax = 0-9 since most of the trajectories go throngh the 
lower channel for this reaction coordinate. 

Using as a reaction coordinate, replica X going thongh the npper channel is likely to get 
trapped aronnd the npper local minimnm, and the resampling procednre may produce trajec¬ 
tories which go back to A withont increasing the level of the parent replica: as a conseqnence 
a higher rate of extinction is observed for than for 

Moreover, it may happen that a replica satisfies < ^a{X) withont hitting B 

before A] it has then no contribntion in the estimator of the probability ¥(tb < ta)- On the 
left plot on Fignrej^ we represent the rirep = 20 replicas obtained at the end of one realization 
of the algorithm where /3 = 6.33, and nsing only 6 replicas ont of 20 reach B even if all of 
them have a maximnm level larger than the stopping level Zmax- In snch a case, Pcorr = 6/20 
(see (|2T|)). 

Remark 5.4 (On the degeneracy of the branching tree). On the right plot of Fignre 
another phenomenon is illnstrated: for a small nnmber of replicas, we typically observe that 
the working replicas at the final iteration of the algorithm have only a few common ancestors. 
For example, for the realization of the algorithm represented on the right plot of Fignre]^ with 
(3 = 6.33 and nsing (abscissa) as the reaction coordinate, the nrep = 20 working replicas 
are all issned from only two ancestors, at the end of the algorithm. Of conrse, the nnmber of 
common ancestors increases when nrep is larger (see for example |10|1. 


Remark 5.5. [On the limit of Rat, p'^ 


bilities 


upper 


flower 

Pn ’ 


ool The fact that the limits 


when N 

when N goes to infinity of the proportions Rtvj Pa)™ or the conditional proba- 

p^upper pmix qii ^ is iiot ill contradiction with the nnbiasedness resnlt 


of Theorem 4.1 Indeed these qnantities are not estimators of the form (19). In particnlar. 


46 




























Figure 5: Trajectories obtained at the end of a realization of the algorithm for which /3 = 6.33, 
nrep = 20. Left: the reaction coordinate is and only 6 of the working replicas have reached 


B {Pcorr = 6/20), see Remark 5.3 Right: the reaction coordinate is and the 20 replicas are 


all issued from only two ancestors, see Remark 5.4 


there are not associated with a functional of the dynamics of the form Q. For example, 
should not be confused with the empirical estimator ^ ^ where (for a 

hxed realization m) 


P 


upper _ 


Urep — 


(X("’'3iter) )<Ta (X ("’<3^61 ) ) ^Xf^’^iter) SUpper 

- KW ( 1 


n 


n 


rep 


rep 

n 


rep 


n 


rep 


X] ^TB(X("-<5iter))<TA(X("’<3iter))lx(’* ’ *3iter )g Upper 

r(Qiter) / 


which is an unbiased estimator of P (Tb(X) < T^(X) and X G Upper). The (unbiased) esti¬ 
mator of P(T^(X) < Tyr(X) and X G Lower) is dehned similarly, as well as the empir¬ 
ical average ^ Yll=i Pm"®'-_ _ 

Table contains the values of and computed using the same realizations 

as above. We include the values of the width and of the conhdence intervals 


associated with the Monte-Carlo procedure (see (39)) 



/5 

N 

flower 

Pn 

slower 

On 

upper 

Pn 

^upper 

Pn 

8.67 

2.10® 

5.2e-10 

5.1e-ll 

1.2e-09 

4.4e-ll 

1.7e-09 


8.67 

2.10® 

5.4e-10 

5.0e-ll 

1.9e-09 

3.5e-09 

2.4e-09 


9.33 

4.10® 

9.1e-ll 

9.4e-12 

2.3e-10 

7.5e-12 

3.2e-10 


9.33 

4.10® 

8.3e-ll 

7.9e-12 

5.2e-10 

7.7e-10 

6.0e-10 


10 

6.10® 

1.6e-ll 

3.0e-12 

4.6e-ll 

1.8e-12 

6.2e-ll 


10 

6.10® 

1.8e-ll 

2.2e-12 

5.0e-ll 

7.0e-ll 

6.8e-ll 


Table 6: The bi-channel case. Estimation of upper and lower channels probabilities for two 
reaction coordinates: the norm to the initial point (^^) and the abscissa (^^). 

We observe good agreement between the values computed with the two reaction coordi¬ 
nates, as predicted by the unbiasedness result of Theorem |4.1| The estimates of the probability 
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P (Ts(^) < T^(X) and X G Lower) and the corresponding empirical variances obtained with 
both reaction coordinates are very close. However, when using the abscissa we see that 
the conhdence interval is much larger than the one when 

using the variance is larger because taki^ the upper channel is less likely when using 
Moreover, if we plot the equivalent of Figure Wfor we observe the same kind of behavior 

for the evolution of the conhdence interval related with as function of the number of 

realizations iV: pjy jump at the same values of N. 

5.3 The second two-dimensional example 

Let us hnally consider another example in dimension two, already used in j^. Our aim is 
to show that the very large huctuations observed with some reaction coordinates in the hrst 
two-dimensional example are related to the existence of multiple pathways from Alo B. This 
is actually very much related to some discussions in the paper m about the origins of the 
so-called “apparent bias” which is observed with splitting algorithms. 

In this example, one parameter governs the shape of the potential. The choice of the 
reaction coordinate and the statistical behavior of the estimator strongly depend on its value. 
One goes from a situation where the estimator has the same statistical behavior whatever 
the reaction coordinates (when there is only one pathway from A to B) to a situation where 
the estimation using one of the reaction coordinate deteriorates for too small Monte-Carlo 
sample sizes (when two pathways link A to H), even though the estimated probability is 
approximately the same in both situations. 


5.3.1 The model 

The second example is inspired by the space discretization of the Allen-Cahn equation (see 0 ). 
The dynamics is again the overdamped Langevin equation (41) discretized using the Euler- 
Maruyama method. The potential function depends on a parameter 7 > 0 and is given 
by 

£'y{x, y) = i{x -yf + \ {V (x) + V (y)), 


where V(z) = ^^ is a double-well potential. 




Figure 6 : Plot of the potential function for the discretized Allen-Cahn problem. Left: 7 = 1; 
right: 7 = 0 . 1 . 


48 


















For any value of 7 > 0, there are two global minima niA = {xa^Va) = (~1) ~1) and 
niB = {xb^Vb) = (1)1)- Moreover, this model exhibits bifurcations with respect to the 
parameter 7 . When 7 > 1/8, (0,0) is the only saddle point, whereas for 7 < 1/8 the latter 
point degenerates into a local maximum and additional saddle points appear (as well as local 
minima if 7 is further decreased). 

The AMS algorithm is tested with four reaction coordinates (the hrst three ones being the 
same as in Section 5.2): 


1 . the norm to the initial point mA'- ^^{x,y) = \/{x — xjCf' + (y — I/a)^) 

2 . the norm to the final point mB: ^‘^{x, y) = ^^{xb^ ys) — -\/(x — xs)^ + (y — yB^, 

3. the abseissa: ^^{x,y) = x, 


4. the magnetization: ^^(x,y) = (x + y)/2. 


The fourth reaction coordinate is called the magnetization because of its interpretation in 
the original Allen-Cahn problem. In the hgures below, we associate a color to each reaction 
coordinate: green for red for blue (line with circles) for and cyan for 

For the reaction coordinate the maximum level used in the stopping criterion of the 
algorithm is denoted by the simulations, the following values are used: = -^^max = 

\/7T and 4 ax = 4 ax = 0-9- 
As above. 


f A = B{mA,p) = |(x,y) G : ^/(x- xa)^ + {v - VaY < p} 

= B{mB,p) = |(x,y) G : y/{x - xbY + {y - ysY < p} , 

with p = 0.05. Notice that for i G {1,2,3,4}, we have B C ('C*)~^(]^max) +oo[) (see ([Io|)). 

The deterministic initial condition is Xq = xq = (—0.9, —0.9), and we always take k = 1. 
By default, the number of replicas is taken equal to Urep = 100, and the empirical averages 
are computed with N = 10® independent realizations. When 7 = 0.1, we also take n^ep = 10 
(with N = 6.10®) and Urep = 1000 (with N = 10®). Notice that we have also tested the 

algorithm in the latter case when A: > 1: since we observe the same kind of behavior as for 

/c = 1 , we do not present the results of these numerical simulations. 

5.3.2 Simulations for 7 = 1 and (3 G {10,20,40,80} 

Let US hrst consider the case 7 = 1 . In this situation there is only one reactive path to 
go from A to B, going through the saddle point (0,0). Let ns consider the following values 
for the inverse temperature (3 G {10,20,40,80}. We plot on Figure]^ the evolution of the 
conhdence interval for the estimator p as a function of the number of independent realizations, 
for rirep = 100. We observe that the conhdence intervals overlap, and that the statistical 
huctuations are very similar whatever the reaction coordinate. 

For comparison, the results obtained with a standard direct Monte-Carlo estimation are 
given in Table 0 for /3 G {10,20,40} with N = 6.10® realizations. The results are consistent 
with those obtained by the AMS algorithm. For fi = 80, the probability is very small and 
we were not able to get a reliable result by standard direct Monte-Carlo simulations with a 
reasonable number of realizations. 
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Figure 7: Evolution as a function of N of the empirical mean and of the associated 95% 
confidence intervals — 6N/‘^,Pn + with 7 = 1 . Upper left: (3 = 10, upper right: 

(3 = 20, lower left: (3 = 40, lower right (3 = 80. 


/3 

10 

20 

40 

Pn 

2.755e-2 

2.062e-3 

1.582e-5 

6n 

0.003e-2 

0.007e-3 

0.063e-5 


Table 7: Standard direct Monte-Carlo estimation with N = 6.10® realizations. 













































5.3.3 Simulations for 7 = 0.1 and /3 = 80 


When the parameter 7 is smaller than 1/8, the point (0,0) is no longer a saddle point (but 
instead a local maximum). In this case, there are two saddle points on the line = 0 (they 
are symmetric with respect to (0,0)). We take 7 = 0.1 and represent on Figure]^ the evolution 
of the confidence interval for the estimator p as a function of the number N of independent 
runs of the algorithm and for different values of Urep- n^ep = 10 (with up to = 6 . 10 ® 
realizations), nrep = 100 (with up to = 10 ® realizations) and n^ep = 1000 (with up to 
N = 10® realizations). The inverse temperature is fixed to /3 = 80. 

We observe that the statistical fluctuations of the estimator for the reaction coordinate 
(abscissa) are much larger than for the three other ones. The estimator is unbiased, but a large 
number of realizations is required to get a significant confidence interval. By comparing with 
the results obtained in the Section 5.2, we thus conclude that the origin of these fluctuations 
is related to the existence of two pathways from A to B, rather than to the smallness of the 
estimated probability for example. In fact, the two pathways do not play symmetric roles 
when using the third reaction coordinate, although they play symmetric roles when using the 
other ones. This is in agreement with some discussions in the paper [2] about the origins of 
the so-called “apparent bias” which is observed with splitting algorithms. 


5.4 Conclusions and practical recommendations 

Let us summarize our hndings on these numerical simulations. 

• We always observe that for sufficiently large values of N (number of independent Monte 
Carlo simulations), the confidence intervals of the estimator p overlap, whatever n^ep, 
fe or C This is in accordance with our theoretical result on the unbiasedness of this 
estimator. 


We observe numerically in our two-dimensional simulations that the estimator p has a 
heavy tail so that very few realizations contribute a lot to the empirical average. This 
explains why for some choices of the reaction coordinate the efficiency of the estima¬ 
tion may be very poor since the empirical average converges very slowly to its limit 
(namely N should be taken sufficiently large to obtain significant results). Actually, as 
explained in Remark 5.1, even in some idealized setting where the reaction coordinate is 
the committor function, in some small probability regime, the estimator has a log-normal 
distribution, and thus has heavy tails. 


• In multiple channel cases (namely when multiple pathways exist from A to B), one may 
observe non-overlapping empirical confidence intervals of the estimator for different reac¬ 
tion coordinates if the number of independent realizations N is too small. This is related 
to the fact that very large contributions to the average of the estimator are associated 
with trajectories going through very unlikely (for the considered reaction coordinate and 
value of nrep) channels. This is a known phenomenon for splitting algorithms in general, 
see where it is referred to as “apparent bias”. In particular, a good reaction coordi¬ 
nate in a multiple channel case is such that, conditionally to reach a certain maximum 
level 2 , the relative likelihood of the channels used by the paths to reach this maximum 
level does not depend too much on For example, a reaction coordinate close to the 
committor function is a good candidate to achieve this purpose. This opens the route 
to adaptive algorithms, where the reaction coordinate would be updated in order to get 
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Figure 8 : Evolution as a function of N of the empirical mean and of the associated 95% 
confidence intervals \p^ — Pn + ^n/‘^] with 7 = 0.1 and j3 = 80. Upper left rirep = 10, 
upper right n^ep = 100 , lower n^ep = 1000 . 
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closer and closer to the committor function as long as successive AMS algorithms are 
launched (see |10|). We intend to investigate this direction in future works. 

As a conclusion to these numerical results, we thus recommend the following in order to 
get reliable estimates of the probability < ta) with the AMS algorithm. 

• One should be careful in the implementation of the splitting and branching steps, in 
particular in the treatment of replicas which have the same maximum level and in the 
dehnition of the branching point in the resampling procedure. For correct implementa¬ 
tions, unbiased estimators can be built, and the general framework of Section yields 
many variants for the algorithm. 

• Thanks to the unbiasedness property, one should check the independence of the com¬ 
puted probability on the choice of the parameters: the number of replicas n^ep, the 
minimum number of resampled replicas k and the reaction coordinates In particular, 
we recommend to perform simulations with various reaction coordinates and to set the 
minimal number of independent realizations such that the empirical conhdence intervals 
overlap. 

• Thanks to the unbiasedness property, one can perform many independent realizations of 
the algorithm with a relatively small number of replicas, instead of using a few indepen¬ 
dent realizations with a large number of replicas. Indeed, assume that we are in a regime 
where the variance scales like —this is the case for instance in the so-called ideal 
case for sufficiently large rirep and N, see [5]. Since the parallelization of independent 
runs of the algorithm is trivial, for a hxed product n^ep x N (namely for a hxed CPU 
cost), the strategy with less replicas is thus much more interesting in terms of wall-clock 
time (which scales like rirep) than the strategy with more replicas. 
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